2010 (old posts, page 2)

On transparency in making standards

William Vambenepe points out some familiar bugs:

  • The mailing lists of DMTF working groups are confidential. Even a DMTF member cannot see the message archive of a group unless he/she is a member of that specific group. The general public cannot see anything at all. And unless I missed it on the site, they cannot even know what DMTF working groups exist. It makes you wonder whether Dick Cheney decided to call his social club of energy company executives a “Task Force” because he was inspired by the secrecy of the DMTF (“Distributed Management Task Force”). Even when the work is finished and the standard published, the DMTF won’t release the mailing list archive, even though these discussions can be a great reference for people who later use the specification.
  • Working documents are also confidential. Working groups can decide to publish some intermediate work, but this needs to be an explicit decision of the group, then approved by its parent group, and in practice it happens rarely (mileage varies depending on the groups).
  • Even when a document is published, the process to provide feedback from the outside seems designed to thwart any attempt. Or at least that’s what it does in practice. Having blogged a fair amount on technical details of two DMTF standards (CMDBf and WS-Management) I often get questions and comments about these specifications from readers. I encourage them to bring their comments to the group and point them to the official feedback page. Not once have I, as a working group participant, seen the comments come out on the other end of the process.

GIS industry standards are made in just such a non-transparent members-only environment. I used to subscribe to the OGC's "mass market" (private archive, but open to subscription) list and tried to engage in some discussion there, but soon realized that although messages from the principals were being cross-posted there, they weren't subscribed themselves and didn't see any responses. I also tried to submit comments to the formal channel and found it to be broken (there's a year long gap in the archives: it could have broken for that length of time without anybody noticing). Now that it's fixed, you can see the public comment process doesn't get much use.

Despite this, the OGC's standards enjoy almost absolute buy-in from non-member GIS specialists, particularly those from the open source community who need something – anything – to counter de-facto standardization on ESRI products.

More features like open source Python GIS please

An ESRI user shares his software wish list:

3. Expose numpy in the geoprocessor: The geoprocessor as is right now (9.3.1) uses the excellent numpy module to perform matrix algebra (think of raster manipulation). Yet, when one wishes to run numpy commands, one needs to manually read raster files with GDAL, import them as numpy arrays (default), perform operations, and translate back to raster. ESRI must have modules for dealing with this, and we want them. Why would ESRI want to do this? Right now, raster manipulation through Python is done outside the geoprocessor. Most people turn to open source tools to manipulate data, which leads to less and less users relying on ESRI for this. Why pay when free software will do it? The capability is there, and we need to access it too.

Fewer and fewer users relying on ESRI for this? You say that like it's a bad thing.

Comments

Re: More features like open source Python GIS please

Author: Kay

ESRI already uses GDAL and Numpy, if they would just create a Driver for their GeoDatabases in GDAL/OGR and release it to the gdal-community. Then they could distribute GDAL/OGR-python with their product and all their users would have the power of gdal-numpy available without having to install extra software.

And FOSS-people would be able to use FGDB's.

Everyone Happy.

Re: More features like open source Python GIS please

Author: Sean

An open source GDB driver would make a lot of people happy, but that's not what I'm getting at here. My point is that ESRI users are looking at the features (GDAL, Numpy, etc) and usability (Python) of open source software and wondering "why can't we have more of that?"

Below the buzz

I just stumbled onto this post at ReadWriteWeb:

Google Buzz data can be syndicated out to other services using the standard data formats called Atom, Activity Streams, MediaRSS and PubSubHubbub. That couldn't be more different from Facebook. Google has taken open data standards to battle against a marketplace of competitors that are closed and proprietary to varying degrees. This is a very big deal.

Maybe it is a big deal, and not for the best: Google Buzz the product is falling flat with my sources, who find it too raw, too unpredictable, too much. It takes some of the shine off the underlying architecture.

Plotting GIS shapes

Here's another installment in my series (iterators, revenge of the iterators, features) that considers different Python GIS APIs and environments: plotting geometries with Matplotlib. Matplotlib has a number of functions for plotting and graphing, and they typically take as their first arguments sequences of x and y coordinate values. These sequences are immediately adapted to Numpy arrays. In some sense what we're actually considering here is how well different Python APIs integrate with Numpy.

In every code snippet below we import the Matplotlib pylab module and then obtained a native geometry object geom or feature object in some way (see previous post for details).

ESRI:

x = []
y = []
pnt = geom.Next()
while pnt is not None:
    x.append(pnt.x)
    y.append(pnt.y)
    pnt = geom.Next()
pylab.plot(x, y)

Wrapping the geometry up in a point or vertex iterator would help tidy this up.

FME:

x, y = zip(*feature.getCoordinates())
pylab.plot(x, y)

No actual geometry object in FME, but this isn't bad at all in combination with zip. Remember that zip(*x) unzips the sequence of items x. Within a function call, the * operator explodes a sequence.

OGR:

x, y = zip(*((geom.GetX(i), geom.GetY(i)) for i in range(geom.GetGeometryRef(0).GetPointCount()))
pylab.plot(x, y)

Note that using zip gets this done in one pass over the geometry instead of two as in the example linked through OGR above. I'm telling you: idiommatic Python wins.

QGIS:

x, y = zip(*((p.x, p.y) for p in geom.asPolyline()))
pylab.plot(x, y)

geojson:

x, y = zip(*geom.coordinates)
pylab.plot(x, y)

Easier and easier.

Shapely (1.2a6):

x, y = geom.xy
pylab.plot(x, y)

Easiest yet.

Where possible, I've used a Python idiom to compress the code to two lines for each API example. There are a lot of calls in the OGR and QGIS samples and efficiency and clarity are at odds. Since a list is being built no matter what, there's no harm in doing it with more lines of code for those APIs. With Shapely, I've deliberately made it hard to do it in more than two lines. In fact, it's easier to do it in one:

pylab.plot(*geom.xy)

Hubba hubba hubba hubba hubba

Although I'm getting a little weary of "social media" products I'm very interested in the technology and architecture behind Google's Buzz. Once again we see how differently syndication and synchronization are conceived and engineered by the GIS Enterprise and the Google. Federated Geo-Synchronization on the one hand (developed in a clean room behind a license wall) and on the other: HTTP, Atom, Web Linking, and PubSubHubbub.

Overly disruptive to other APIs? Maybe. Certainly it's a reminder that the web is one API that really counts and that makes me a happy boy.

Shapely 1.2a6 with pictures

One thing that Shapely has lacked is one or two dirt simple example programs to keep the API real and help explain its use. I did something about this over the past couple of nights: 1.2a6 includes two easy to understand, easy to run scripts. I hope users profit from them. Myself, I found that they demanded a new and improved API feature. I'll explain.

First, here's an example of using Shapely to construct patches by growing buffer regions out from a set of points and dissolving those regions together as they intersect, and plotting the results with Matplotlib. This is run-of-the-mill GIS stuff, yes, but done in style.

http://trac.gispython.org/lab/raw-attachment/wiki/Examples/dissolve.png

A plate of blue-speckled brains splattered on the floor, or is it just me?

The interesting part of the complete, amply-documented dissolve.py script is here:

import pylab
from shapely.ops import cascaded_union

patches = cascaded_union(spots)

pylab.figure(num=None, figsize=(4, 4), dpi=180)

for patch in patches.geoms:
    x, y = patch.exterior.xy
    pylab.fill(x, y, color='#cccccc', aa=True)
    pylab.plot(x, y, color='#666666', aa=True, lw=1.0)
    for hole in patch.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='#999999', aa=True, lw=1.0)

pylab.text(-25, 25,
    "Patches: %d, total area: %.2f" % (len(patches.geoms), patches.area))

pylab.savefig('dissolve.png')

The xy property is completely new in 1.2a6, inspired by how awkwardly I had to slice and dice coordinates when writing this example against 1.2a5. It provides two Python arrays that are immediately usable with Numpy or Matplotlib. Speaking of Matplotlib: I'd love to know how to fill a patch but not its holes (you'll notice that I'm faking the emptiness of the holes in this example).

What would would you have to go through to pyplot ArcGIS scripting results?

Shapely doesn't just make grey matter go splat, it can also toss brains in the air and pierce them with lasers:

http://trac.gispython.org/lab/raw-attachment/wiki/Examples/intersect.png

Or make a fair facsimile thereof. What's really going on in intersect.py is an analysis of a HTML5 geolocation (latitude, longitude, heading, and speed) trajectory's intersection with a cluster of patches. The intercepted patches are plotted in red and the intersecting segments of the trajectory itself are also plotted in red. Finally, scalar properties of different geometries are used in a text label. The example vector intercepts 2 of the 7 patches along 5 segments with a total length (to one decimal place) of 26.1:

import pylab
from shapely.geometry import LineString

# Represent the following geolocation parameters
#
# initial position: -25, -25
# heading: 45.0
# speed: 50*sqrt(2)
#
# as a line
vector = LineString(((-25.0, -25.0), (25.0, 25.0)))

# Find intercepted and missed patches. List the former so we can count them
intercepts = [patch for patch in patches.geoms if vector.intersects(patch)]
misses = (patch for patch in patches.geoms if not vector.intersects(patch))

pylab.figure(num=None, figsize=(4, 4), dpi=180)

for spot in misses:
    x, y = spot.exterior.xy
    pylab.fill(x, y, color='#cccccc', aa=True)
    pylab.plot(x, y, color='#999999', aa=True, lw=1.0)
    for hole in spot.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='#999999', aa=True, lw=1.0)

for spot in intercepts:
    x, y = spot.exterior.xy
    pylab.fill(x, y, color='red', alpha=0.25, aa=True)
    pylab.plot(x, y, color='red', alpha=0.5, aa=True, lw=1.0)
    for hole in spot.interiors:
        x, y = hole.xy
        pylab.fill(x, y, color='#ffffff', aa=True)
        pylab.plot(x, y, color='red', alpha=0.5, aa=True, lw=1.0)

pylab.arrow(-25, -25, 50, 50, color='#999999', aa=True,
    head_width=1.0, head_length=1.0)

intersection = vector.intersection(patches)
for segment in intersection.geoms:
    x, y = segment.xy
    pylab.plot(x, y, color='red', aa=True, lw=1.5)

pylab.text(-28, 25,
    "Patches: %d/%d (%d), total length: %.1f" \
     % (len(intercepts), len(patches.geoms),
        len(intersection.geoms), intersection.length))

pylab.savefig('intersect.png')

Install GEOS 3.2.0 (Windows users can get it from a PostGIS 1.5 installer, but will have to copy the DLLs to a location one can glean only from looking at shapely/geos.py. YMMV until we have Shapely 1.2 installers) then grab the new distribution with easy_install or pip (as well as Numpy and Matplotlib) and give them a try:

$ python /usr/local/bin/dissolve.py
$ python /usr/local/bin/intersect.py

I think this is pretty much the last 1.2 alpha.

Diving into geolocation

Speaking of the open web, here's Mark Pilgrim's take on HTML5 geolocation:

Geolocation is the art of figuring out where you are in the world and (optionally) sharing that information with people you trust. There are many ways to figure out where you are — your IP address, your wireless network connection, which cell tower your phone is talking to, or dedicated GPS hardware that receives latitude and longitude information from satellites in the sky.

You can also pick your location, or any other location at all that suits your needs, from a map using René-Luc's Firefox Geolocater.

GeoWeb blues

Apparently, a lot of the "GeoWeb" is made of blue legos. Say what you will about HTML and SVG, but open web stuff comes less and less with these kind of nasty surprises and titanic games of chicken thundering around your business.

In which we go into the weeds for some REST

On the descending portion of the hype cycle now it seems that, like a guy in a "Rock Star" t-shirt, a "REST API" most likely isn't. It might be using HTTP as a uniform interface and identifying things with URIs, but then you find it provides text/xml or application/json responses with no links and out-of-band rules for teleporting (you can't call it traversing) to other parts of the API. Tight coupling like that is not what REST is about.

One that's getting very close is GeoServer's Configuration API. It has links from workspaces to datastores to layers, and a non-HTML client should in theory be able to follow them, changing the configuration state of the service in a step-by-step manner, led by the service itself, much in the same way you would through a web browser. All from one bookmarkable URI. This is what REST is about.

I say "in theory" because the GeoServer API doesn't hold water for formats other than HTML. Here's the problem: given a bookmarked URI ending in "workspaces" like http://example.com/workspaces, how does a client determine that this URI identifies a resource to which you can POST a new workspace and begin the configuration process using in-band information only? If you're working with a text/html representation of the resource, you'll be shown a form, and away you go, RESTfully. The semantics of forms, and specifically that submitting one sends data to certain URI, are defined in the text/html media type standard. A client doesn't need any out-of-band information: the form is in the representation, the semantics are specified by the standard "text/html" value of Content-Type header, both in-band. Now, if the server sends you back a text/xml response, there's no way for a client to know only from in-band information how it is to act on the response. That it's a certain type of resource (a GeoServer Workspace) because the URI ends in "workspaces" and the representation has a root <workspaces> element? That's out of band. That the bookmarked URI is a "GeoServer workspace bookmark"? That's out of band too.

AtomPub, on the other hand, holds water because the POST-ability of service resources (for creating new collections) is standardized under the media type "application/atomsvc+xml". If a client GETs a URI and that format comes back, the POST-ability is communicated, in-band. The "application/atom+xml" media type does the same for collections and entries, especially in its specification that an "edit" link tells the client via which resource it modifies entry and collection state. Standardizing on Atom and AtomPub, if you can, is therefore a good bet.

The interesting thing about REST that distinguishes itself from other styles is that interaction is driven by in-band information. Loose coupling, evolvability, and longevity are properties of a system that has the hypertext constraint. To get these properties, GeoServer and other APIs need to eliminate the out-of-band communication. Standardize on media types like HTML or Atom, mint their own media types (application/vnd.geoserver+xml or some such), or use links with standard relations in HTTP headers (aka Web Linking) and push for client support of those.

Comments

Re: In which we go into the weeds for some REST

Author: Chris Holmes

Thanks for the review Sean, our goal is to make GeoServer as RESTful as possible (indeed when we have the time we'd like to do REST feature access alternative to WFS).

Practically I'm still not sure of the best way forward. Atom does seem better than text/xml, but even if we did that wouldn't we still want to have text/xml representations of resources? Or are you advocating replacing all the text/xml responses with Atom/AtomPub?

As for application/vnd.geoserver+xml - isn't that out of band in its own way? Like developing a client against it you'd still need to know something about that format? Or you're saying it'd just be a better self-documenting one? I'd be interested in your ideas of what exactly that looks like. And again, would it replace text/xml responses?

As for Web Linking, it looks great but is it even accepted yet? Not that we're opposed to implementing a developing standard and encouraging its adoption, but I think things like feature access through REST are higher priority for us. And the idea with that is we'd just add http headers to our text/xml responses? If you want to help us you could sketch out exactly what headers we should add - I think it's pretty easy to add in extra http headers, so if it's not much of an effort we might be able to do it soon.

Re: In which we go into the weeds for some REST

Author: Allan Doyle

Chris asks "As for application/vnd.geoserver+xml - isn't that out of band in its own way? " -- That was my question, too. I was going to ask about "application/atomsvc+xml" instead.

Sean said

AtomPub, on the other hand, holds water because the POST-ability of service resources (for creating new collections) is standardized under the media type "application/atomsvc+xml". If a client GETs a URI and that format comes back, the POST-ability is communicated, in-band.

Isn't that only because RFC 5023 says it's POST-able? Then RFC 5023 is the out-of-band knowledge.

Re: In which we go into the weeds for some REST

Author: Sean

RFC 5023 isn't out-of-band: it and application/atomsvc+xml and application/atom+xml are part of the fabric of the web.

For a different take on the subject you should check out http://www.subbu.org/blog/2009/12/media-types-and-plumbing.

Governance of out of band semantics

Author: Rob Atkinson

There is a significant implication in using a special MIME type that is "part of the fabric" of the web to indicate how a client is supposed to interpret content, and the actions it may take as a result. This basically implies that conformant RESTful semantics are only possible within the governance framework of the web "fabric" - its not open to application domains to define semantics or behaviour of APIs.

Perhaps application API semantics have to be considered as an out of band (from REST point of view) on top of REST. I.e. REST semantics is an out-of-band part of any application API. This perhaps makes sense, because strong governance (fabric of the web) is useful in an out-of-band context, whereas private application semantics are much more problematic as out-of-band information, because they are hard to discover, formalise consistently, create and interpret.

Re: In which we go into the weeds for some REST

Author: Sean

I really must rethink what I've said about the GeoServer API not holding water after another read of Subbu's post. I've agreed with those who say "application/xml is not the media type you're looking for" (Mark Baker, Jim Webber) if you want to use web links, not XLinks. That link semantics aren't conveyed by the Atom namespace, but by the media type. While that's still my take, I should give Subbu's a try. If you stick to the minimum HTTP protocol (Atom invents a somewhat specialized one) and straight XML processing (Atom has distinctly different processing rules, like "must ignore"), application/xml might be fine. At any rate, I think you'd at least need an actual namespace for workspace and data store elements to make this hold water using application/xml.

No mistake: a configuration API is a great place to start getting into the REST style, and GeoServer is getting close to nailing it.

Re: In which we go into the weeds for some REST

Author: Sean

There's another issue in the GeoServer docs, which document a practice of interaction driven through a fixed URI hierarchy (see Fielding's 4 bullet at http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven). There's a protocol implied there that application/xml doesn't begin to hint at. Better: drive interaction through the links that are already present in GeoServer's workspace (and friends) representations. GeoServer is ready for REST in a lot of ways, but documents a contrary usage that will result in unnecessary coupling.