Links in content

One of the key constraints of a RESTful web services is adherence to hypertext as the engine of application state, or as Bill de hÓra says, links in content. AtomPub has this: service documents link to collections; collections link to entries; entries link to their editing resource. Why? For resiliency, evolvability, and longevity. Links in content allow clients and servers to be decoupled; an agent can follow its nose into the service and its contents, and need not be compiled against the service. The service is more free to add resources, proxy/cache resources, move resources, phase out resources. In theory, the properties of resiliency, evolvability, and longevity are products of the hypertext constraint. This theory is continually tested, and mostly validated, day after day, year after year, on the Web. Roy Fielding wrote in a comment on his blog:

REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. Many of the constraints are directly opposed to short-term efficiency.

If your services aspire to the level of infrastructure, links in content is a better architectural style than one where all clients break when the API changes, or that demand a client upgrade to get access to any new capabilities.

Service developers often mistake hierarchical URIs with the hypertext constraint. An API with URIs like http://example.com/api/food/meat/red looks clean, but unless there's a resource at http://example.com/api/food/meat/ that explicitly connects – whether using links, forms, or URI templating – clients to the resource at http://example.com/api/food/meat/red (and it's sibling "white"), it's only a cosmetic cleaning. The API might as well use http://example.com/api?tag=food&tag=meat&tag=red. I pointed out the lack of links in the very handy New York Times Congress API on Twitter and got a response from a developer. I assert that for the API to be RESTful, there should be links to subordinate "house" and "senate" resources in the response below instead of a server error:

$ curl -v "http://api.nytimes.com/svc/politics/v2/us/legislative/congress/103/?api-key=..."
> GET /svc/politics/v2/us/legislative/congress/103/?api-key=... HTTP/1.1
> Host: api.nytimes.com
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/xml; charset=utf-8
< Content-Length: 279
<
<?xml version="1.0"?>
<result_set>
        <status>ERROR</status>
        <copyright>Copyright (c) 2009 The New York Times Company.  All Rights Reserved.</copyright>
        <errors>
                <error>Internal error</error>
        </errors>
        <results/>
</result_set>

One of the best examples of links in geospatial service content is ESRI's ArcGIS sample server. It's entirely navigable for an agent such as a web browser. Agents that follow the links in the content can easily tolerate addition and deletion of services, or their move to new URIs. See also the JSON representation of that same resource, http://sampleserver1.arcgisonline.com/arcgis/rest/services/?f=json:

{
  "folders": [
    "Demographics",
    "Elevation",
    "Locators",
    "Louisville",
    "Network",
    "Petroleum",
    "Portland",
    "Specialty"
  ],
  "services": [
    {
      "name": "Geometry",
      "type": "GeometryServer"
    }
  ]
}

The service doesn't make it clear enough that the items in the "folders" and "services" lists are the relative URIs of subordinate resources, but that's clearly the intention. Nevermind that the ArcGIS REST API is layered over SOAP services; it's very close to getting the hypertext constraint right and worth emulating and improving upon. ESRI is astronomical units beyond the OGC in applying web architecture to GIS. (Note: the JSON format itself has no link constructs, so JSON APIs are on their own. The lack of a common JSON linking construct is a big deal. As I've mentioned before, it prevents GeoJSON APIs from being truly RESTful.)

As Fielding points out, constraining clients to crawl your service, instead of compiling against it, can have a performance cost. On the other hand, clients are welcome to optimize by caching the structure of a service for a time/value specified by the server, using the expiration and validation mechanisms built into HTTP/1.1. The extra cost of crawling need not be paid any more often than necessary.

Finally, consider that you might not even need REST in your API. Seriously, you might not need it. Not every service needs to span many organizations, or support dozens of different clients. Not every service needs to be around for 10, 15, 20 years.

I'm eager to see if the touted GeoCommons API has the hypertext constraint. I'm almost certain it will declare itself to be "RESTful", both because of the zeal of the company's marketing folks, and because its CTO, Andrew Turner, is honestly big on web architecture. If it does, it would be taking a step towards becoming a real spatial data infrastructure.

Update (2009-01-15): I just remembered that Subbu Allamaraju has a related and much more detailed article on describing RESTful applications.

Update (2009-01-20): links are already a requested Congress API feature.

Comments

Re: Links in content

Author: Sean

Right. A GeoJSON API can invent its own linking construct, like ESRI did, but there's a risk of getting it wrong or, at the very least, too different. Until there's a standard, or at least consensus, JSON APIs will tend to be like snowflakes. And again: API developers should consider whether they need REST. An HTTP API that uses GET/POST properly and supports expiration/validation (and maybe even paging) is worthy enough. Hierarchical URIs like the ones of the Congress API do let you add in the hypertext constraint for full REST if you want it, and so have a non-cosmetic advantage, after all.

Re: Links in content

Author: Keyur

KML network links is a great example as well.

linking

Author: Ian Bicking

I don't think GeoJSON is necessarily any more lacking in linking than, say, Atom. XML doesn't have any native sense of a "link", but Atom does -- if you use the link tag, you are creating a link (when using some extension tag, though, it's unclear if you are linking or not). It's a link because the Atom specification (itself built on the XML syntax) defines it as a link. Similarly GeoJSON builds on the JSON syntax, but any linkyness is based on the GeoJSON specification.

Re: Links in content

Author: Sean

Yes, Ian, but we punted when faced with specifying links for GeoJSON; it doesn't have them.

Re: Links in content

Author: Andrew Turner

Indeed, this goes back to the topic from quite a bit ago, and also why I'm such a fan of OpenSearch. It's simpler and more broadly applicable than WADL while giving simple links to the broadest use of any API. And results from there linking to individual resources and methods. The NavigatingWashington site is primarily just embeds, with just a little tinge of what we're doing with the API.

GIS consultancy stimulus proposal

US $1,200,000,000 is a lot of GeoPork. While I think the proposed data sets are pragmatic enough (maybe substitute climate/environmental data for wildlife data), I can't get behind any proposal for this level of public funding that doesn't explicitly put open data in the public's hands ("publicly-accessible" is far too vague). And I shudder to think of $450,000,000 spent to line SOA's casket.

Via APB.

Update (2009-01-13): "shudder to think" is a rather lame cliche that I regret using. Apologies, my dear readers. I really feel more like Captain Blackadder reading orders from General Melchett – which means intense, visceral shuddering, waves of violent fear and loathing, and a hyperbolic analogy invoking unfair stereotypes of the French.

OpenLayers and Djatoka imagery

Hugh Cayless has written an OpenURL image layer for OpenLayers that pulls imagery from Djatoka. I'm eager to see it in action. I've heard other library folks talking about doing this kind of thing with "GeoPDF"; my hope (I'm not speaking for Hugh or UNC) is that they'll take a look at this kind of non-proprietary solution before they do.

More decoration

Christopher Schmidt explains the traditional approach to wrapping functions and methods, one I use regularly; Python's built-in property function, as a decorator, produces read-only properties, but can provide read-write property access when used traditionally.

Are decorators merely cosmetic? I'm of the opinion that some syntaxes are better than others. You're likely to agree that:

>>> 1 + 2
3

is more concise, readable, and intuitive than

>>> int(1).__add__(2)
3

but may not agree that Python's decorators are a syntactic improvement. PEP 318 was hotly debated, but is final; decorators are in, and they'll be expanded in 3.0.

The motivation for decorators is compelling:

The current method of applying a transformation to a function or method places the actual transformation after the function body. For large functions this separates a key component of the function's behavior from the definition of the rest of the function's external interface. For example:

def foo(self):
    perform method operation
foo = classmethod(foo)

This becomes less readable with longer methods. It also seems less than pythonic to name the function three times for what is conceptually a single declaration. A solution to this problem is to move the transformation of the method closer to the method's own declaration. The intent of the new syntax is to replace:

def foo(cls):
    pass
foo = synchronized(lock)(foo)
foo = classmethod(foo)

with an alternative that places the decoration in the function's declaration:

@classmethod
@synchronized(lock)
def foo(cls):
    pass

Even if calling code isn't exactly broken, wrapping a function more than likely changes the function's signature in some way; keeping all signature specification (such as it is in Python) at the head of a function is a good thing and requires some syntax like that of PEP 318. GIS programmers who've come to Python in the past several years via ArcGIS should get with @. If you can't or won't, that's fine too; there's another way, as Christopher shows.

On "prettier code": all else being equal, prettier code is more readable code. It's code that can teach, that can be more easily modified by others. In some ways, better code.

One downside of the decorator syntax: ability to test decorators in a doctest eludes me. The following:

def noisy(func):
    """
    >>> @noisy
    >>> print foo()
    Blah, blah, blah
    1
    """
    def wrapper(*args):
        print "Blah, blah, blah"
        return func(*args)
    return wrapper

@noisy
def foo():
    return 1

fails:

Exception raised:
    Traceback (most recent call last):
    ...
       @noisy

    ^
     SyntaxError: unexpected EOF while parsing

Could be ignorance on my part.

Comments

Re: More decoration

Author: Christopher Schmidt

def noisy(func):
    """
    >>> @noisy
    ... def foo():
    ...     return 1
    >>> foo()
    Blah, blah, blah
    1
    """
    def wrapper(*args):
        print "Blah, blah, blah"
        return func(*args)
    return wrapper

Re: More decoration

Author: Sean

Moving foo inside the docstring doesn't help. It was being found by doctest before.

Re: More decoration

Author: Christopher Schmidt

I guess I don't know what you're trying to do. I'm not trying to test 'foo', I'm trying to test 'noisy'. So I define a 'foo' that uses 'noisy', and I test that 'foo' does what I want. (In this case, foo does nothing except return '1'; this would still be the case even if the real 'foo'.
disciplina:~ crschmidt$ python foo.py -v
Trying:
    @noisy
    def foo():
        return 1
Expecting nothing
ok
Trying:
    foo()
Expecting:
    Blah, blah, blah
    1
ok
1 items had no tests:
    __main__
1 items passed all tests:
   2 tests in __main__.noisy
2 tests in 2 items.
2 passed and 0 failed.
Test passed.
So, I guess I don't know what you're trying to do.

Re: More decoration

Author: Sean

Ah, I finally see what's up. My original docstring text
  >>> @noisy
  >>> def foo():
  ...
is invalid Python, which is obvious if you type it into a prompt:
  >>> @noisy
  ...
Thanks for the help, Christopher. And yes, best to define the foo mock within noisy's docstring test.

How to decorate Python GIS code

Last month I blogged about Python logging and how to avoid using print statements in geoprocessing code. But your crufty old code isn't going to rewrite itself, and you're overworked already. An efficient fix would be optimal, and I've got one that only requires a little time to learn how to use Python decorators.

Say you have a module and function that does some geoprocessing work and prints various messages along the way. Something like this:

def work():
    print "Starting some work."
    print "Doing some work ..."
    print "Finished the work ..."

if __name__ == "__main__":
    work()

which, when run, produces output in your terminal.

$ python work.py
Starting some work.
Doing some work ...
Finished the work ...

Now, your function is much more gnarly than work(), and rewriting it will only sap your goodwill toward its author. You'd think it would be possible to somehow wrap the work() function, catching those print statements and redirecting them to a logger – while not breaking code that calls work() – all in a reusable fashion. And it is possible, using a decorator like the 'logprints' class in the code below:

import logging
from StringIO import StringIO
import sys

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s %(levelname)s %(message)s',
    filename='work.log',
    filemode='w'
    )

class logprints(object):

    def __init__(self, func):
        # Called when function is decorated
        self.func = func

    def __call__(self, *args, **kwargs):
        # Called when decorated function is called

        # save reference to stdout
        saved = sys.stdout

        # make a string buffer and redirect stdout
        net = StringIO()
        sys.stdout = net

        # call original function
        retval = self.func(*args, **kwargs)

        # restore stdout
        sys.stdout = saved

        # read captured lines and log them
        net.seek(0)
        for line in net.readlines():
            logging.info(line.rstrip())

        # return original function's return value(s)
        return retval

@logprints
def work():
    print "Starting some work."
    print "Doing some work ..."
    print "Finished the work ..."

if __name__ == "__main__":
    work()

The statement "@logprints" is interpreted as "decorate the immediately following function with the 'logprints' class." On import of this module, the method logprints.__init__() is called with 'work' as the sole argument. Afterwards, when work() is interpreted, logprints.__call__() is called. That method acts as a proxy for the original, now decorated, function. Here is the print capturing and logging decorator in action:

$ python work2.py
$ cat work.log
2008-12-30 12:14:44,044 INFO Starting some work.
2008-12-30 12:14:44,044 INFO Doing some work ...
2008-12-30 12:14:44,044 INFO Finished the work ...

Yes, you could have redirected the output of the original script in the terminal, but remember that Python's logging module sets you up to do much more.

I've recently learned how to use parameterized decorators by following the examples in Bruce Eckel's article. I'm using one to deprecate functions in Shapely:

import warnings

class deprecated(object):

    """Mark a function deprecated.
    """

    def __init__(self, version="'unknown'"):
        self.version = version
        self.msg_tmpl = "Call to deprecated function '%s', to be removed in version %s"

    def __call__(self, func):
        def wrapping(*args, **kwargs):
            warnings.warn(self.msg_tmpl % (func.__name__, self.version),
                          DeprecationWarning,
                          stacklevel=2
                          )

            return func(*args, **kwargs)
        wrapping.__name__ = func.__name__
        wrapping.__doc__ = func.__doc__
        wrapping.__dict__.update(func.__dict__)
        return wrapping

Marking a function deprecated like:

>>> from shapely.deprecation import deprecated
>>> @deprecated(version="1.1")
... def foo():
...     return None
...

causes a warning to be emitted when the function is called:

>>> foo()
/Users/seang/code/gispy-lab/bin/labpy:1:
DeprecationWarning: Call to deprecated function 'foo',
to be removed in version 1.1

Deprecation-marking decorators are a great solution (which I first saw used, in a different form, in Zope 3). Why would you want to rewrite a function that's going away in the next software version?

Decorators can also be chained. In Shapely I've factored the check for non-nullness of GEOS geometries into a decorator and chain it with the built-in property decorator:

@property
@exceptNull
def geoms(self):
    return GeometrySequence(self, LineString)

To this effect:

>>> from shapely.geometry import MultiPoint
>>> m = MultiPoint()
>>> m.geoms
Traceback (most recent call last):
...
ValueError: Null geometry supports no operations

The exception is raised by the 'exceptNull' decorator.

Not much specifically about GIS here, I'll admit, but GIS programming in Python is, or should be, just Python programming. Feel free to comment if you see any interesting applications of decorators.

Comments

Re: How to decorate Python GIS code

Author: brentp

nice logging decorator. i believe it's also good form to use the decorator module: http://pypi.python.org/pypi/decorator to reduce boiler-plate. and it gives you decorator.decorator to decorate your decorators.

ESRI users discover setuptools and easy_install

My work is done. Or, at least, the part of my work not involved with deprogramming OGC web services cult members. And the part of my work not involved with tooting my own horn. For example, check out this blog post from 2005 (2005!) on emailing Python script errors. Prescient, huh? Too bad I didn't write "or send a message not exceeding 140 characters -- a 'tweet', so to speak -- to your 'followers'." instead of "or ping your enterprise's paging system." If I had done that, you'd never hear the end of it.

I can has Python and GIS environments?

I've spent this short week tuning up my new laptop's development environment, and a side effect of this work is a new build system for replicable, isolated Python, GIS, and image/raster processing environments. Ichpage replaces Gdawg on my machine. It supplies:

  • GDAL (osgeo.gdal, etc)
  • geojson
  • geopy
  • keytree
  • lxml
  • Numpy
  • PIL
  • PyProj
  • Rtree
  • Shapely

and their various library dependencies (libgdal, libgeos_c, libspatialindex, libxml2, libxslt). To get started, clone or get the tarball, cd into the directory, and execute:

$ virtualenv .
$ source ./bin/activate
(ichpage)$ python bootstrap.py
(ichpage)$ buildout
(ichpage)$ . ./setenv
(ichpage)$ labpy
>>> from osgeo import gdal
>>> from shapely.geometry import Point
...

To do tasks include linking the GDAL utilities into the environment's bin directory, adding WorldMill, perhaps adding matplotlib. For now, it's a way for me to manage C libs while I develop Shapely and Rtree, and perhaps useful to other geospatial Python developers.

Comments

Re: I can has Python and GIS environments?

Author: Kurt

keytree?

Re: I can has Python and GIS environments?

Author: Sean

Keytree is a little KML helper for use with Python ElementTree APIs.

Preserving first-generation web/GIS projects

Check out this interesting article about the reanimation of an orphaned plant database and its associated ArcIMS instance. The analysis of the issues is sound. I disagree, of course, with their conclusion that ArcIMS is something worth learning and deploying in 2008, and this raises in my mind another issue that the authors did not identify: is not the project's data and its provenance the thing that is most important to preserve? Must the interface cruft around it be preserved in anything other than an archived form, if at all? The ArcIMS user interface and the species database browser are no kind of programmable web APIs; it's unlikely any other application would be broken by a switch to some free web mapping framework or modern search interface.

Now there's a question: switch to what? If this story is just the beginning, and bigger boxes of used, discarded, but potentially useful first-generation web/GIS projects end up in the laps of librarians, a turnkey (and open source, naturally) ArcIMS to MapServer/MapGuide migration tool might be a handy thing. I wouldn't be surprised if such a thing existed. Its authors might want to consider pitching it to GIS librarians in higher education.

Comments

Re: Preserving first-generation web/GIS projects

Author: Jason Birch

That's just freaking bizarre. I was lying in bed last night thinking about whether it would be hard to write an AXL to MapGuide XML transformation.

Re: Preserving first-generation web/GIS projects

Author: James Fee

Wow, deploy ArcIMS in 2008/2009. This is why ESRI can't kill ArcIMS, folks still want to use the darn thing. As much as Jack can get up on stage and basically say the thing is depreciated, people still refuse to listen. Of course part of the problem is ESRI is willing to continue selling licenses to people to deploy it, but I suppose in a higher ed setting site licenses abound and they could have easily gone with the ESRI RESTful API if they wanted to stay ESRI.

Geojson 1.0.1

Geojson 1.0.1 fixes a bug in serialization of features with no geometry.

Comments

Re: Geojson 1.0.1

Author: Stefano Costa

Sean, I'm trying to update, but:
steko@gibreel:~$ sudo easy_install -U geojson
Searching for geojson
Reading http://pypi.python.org/simple/geojson/
Reading http://trac.gispython.org/lab/wiki/GeoJSON
Reading http://trac.gispython.org/projects/PCL/wiki/GeoJSON
Best match: geojson 1.0.1
Downloading http://pypi.python.org/packages/source/g/geojson/geojson-1.0.1.tar.gz#md5=c594cd40085987eafec38f457ff8db49
Processing geojson-1.0.1.tar.gz
Running geojson-1.0.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-tDg03o/geojson-1.0.1/egg-dist-tmp-Y-zuud
error: VERSION.txt: No such file or directory
On Debian Sid, not sure if it's a platform-specific problem or a bug in the package.

Re: Geojson 1.0.1

Author: Sean

Stefano, I made the release yesterday from my new Mac, and the sdist was truly broken. I've uploaded a new sdist (made on the same Linux box as 1.0.0) that appears to be fine.

Re: Geojson 1.0.1

Author: Stefano Costa

The current sdist works fine, thanks. BTW, it's just too bad that easy_install doesn't support removing an installed egg. Of course the best option would be to have some apt-gettable packages in DebianGIS, but one thing at a time...

Re: Geojson 1.0.1

Author: Sean

I agree with you about uninstalling. Ian Bicking's pip aims to address that. I disagree about Debian packages being the best option. I fully support Python's goal of being able to distribute and install its own packages.

The return of the scientist

While I was fiddling with a related blog post, word came out about Obama's appointment of John Holdren to the post of White House science advisor. Holdren's Boston Globe op-ed from the summer is now a must read:

The few climate-change "skeptics" with any sort of scientific credentials continue to receive attention in the media out of all proportion to their numbers, their qualifications, or the merit of their arguments. And this muddying of the waters of public discourse is being magnified by the parroting of these arguments by a larger population of amateur skeptics with no scientific credentials at all.

I'm not sure, but I think he might be talking about you, John Christy and Joe Francica. It looks like we're going to start "counting the bears" in earnest. If Francica was taken aback by Obama's comment on the rancid state of Interior, I fear he may need someone to grab him a fainting couch for this.