2009 (old posts, page 1)

In order to form a more perfect union

Paul Ramsey on GEOS 3.1 and PostGIS 1.4 improvements:

Here's a less contrived result, the 3141 counties in the United States. Using the old ST_Union(), the union takes 42 seconds. Using the new ST_Union() (coming in PostGIS 1.4.0) the union takes 3.7 seconds.

Now that's change we can believe in.

Comments

Re: In order to form a more perfect union

Author: Guillaume

Yes, we can !

KML and atom:link

Jason Birch is right in wanting to use rel="alternate" in his KML atom:link, and the OGC KML spec is wrong in limiting us to "rel=related". Andrew Turner has written even more about what you can do with "alternate" links here. I remember commenting that the KML spec's public comment period was a bit short and ill-timed (Christmas 2007). Perhaps this error would have been caught otherwise?

Related: kml description considered harmful

Comments

Re: KML and atom:link

Author: Sean

I don't think that XSD constrains @rel at all. I believe it was probably the intention of the KML spec writers to import all of atom:link and that the language in the OGC KML spec is just erroneous. If developers go to the Atom syntax spec to understand atom:link, they'll be fine.

Re: KML and atom:link

Author: Jason Birch

I checked the schema too, and it didn't appear to place any restrictions. The only reason I ran into this (I don't make a habit of reading specifications) is that Galdos' KML validator picked it up.

Services and web resources

David Smith and I have been discussing web "services" and web "resources". He'd like to use the terms interchangeably, but I feel that's improper. Not all resources are services. Is the HTML page representing this blog post a service? No. Are the images within it services? No. Is my blog a service? No, although it has ambition sometimes. On the other hand, not all services are web resources (CORBA, DCOM, Ice, Twisted, SOAP), and many of the rest are poor web resources. The situation looks a bit like this:

http://farm4.static.flickr.com/3362/3215405911_cab3667f15_o_d.png

What makes a web resource is explained in http://www.w3.org/TR/webarch/. Consider this classic diagram from that document:

http://www.w3.org/TR/webarch/uri-res-rep.png

That's the architecture of the Web summarized in a single picture. Resources are identified by URIs, and agents interact with resources by sending messages to (for example) retrieve their representations. There is harmony and consistency among the three concepts in the picture above. Now consider a similar picture of an OGC web something service, rendered for effect in the same style:

http://farm4.static.flickr.com/3093/3213872061_2f4270b082_o_d.png

(I'm using the GeoBase service as an example because of its high profile. It's typical of WxS service implementations.)

Does the service's "Online Resource" URL (http://www.geobase.ca/wms-bin/cubeserv.cgi) identify a web service resource? As much as you'd like to think so, it's not immediately clear that this is true. I've put a question mark in the diagram. Dereferencing that URL might provide more information:

seang$ curl -i http://wms.geobase.ca/wms-bin/cubeserv.cgi?
HTTP/1.1 200 OK
Date: Wed, 21 Jan 2009 20:48:08 GMT
Server: Apache/2.0.52 (Red Hat)
Connection: close
Transfer-Encoding: chunked
Content-Type: application/vnd.ogc.se+xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<ServiceExceptionReport version="1.1.3" xmlns="http://www.opengis.net/ows"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.opengis.net/ows ...">
<ServiceException>
CubeSERV-35000: Missing REQUEST parameter
(raised in function handleWmsRequest() of file "main.c" line 422)
</ServiceException>
</ServiceExceptionReport>

The '200 OK' response, in accord with RFC 2616, section 10.2.1, indicates that the response carries the representation of the resource identified by http://www.geobase.ca/wms-bin/cubeserv.cgi. That representation has content type 'application/vnd.ogc.se+xml' and contains a traceback (running in debug mode or what?). Interpretation: http://www.geobase.ca/wms-bin/cubeserv.cgi identifies not a service, but a service exception document. An agent can't stick to HTTP/1.1 and interpret this in another way.

Just to show that this is not just the fault of GeoBase, here's an interaction with another prominent service:

seang$ curl -i http://gisdata.usgs.net/wmsconnector/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD
HTTP/1.1 200 OK
Date: Wed, 21 Jan 2009 20:49:01 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Pragma: No-cache
Cache-Control: no-cache
Expires: Wed, 31 Dec 1969 18:00:00 CST
Content-Type: application/vnd.ogc.se_xml
Content-Length: 294

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<ServiceExceptionReport version="1.1.1">
<ServiceException>
Missing mandatory REQUEST parameter. Possibilities are
{capabilities|GetCapabilities|map|GetMap|feature_info|GetFeatureInfo}
</ServiceException>
</ServiceExceptionReport>

Again, http://gisdata.usgs.net/wmsconnector/com.esri.wms.Esrimap?ServiceName=USGS_WMS_NLCD identifies not a service, but a service exception document.

Are these OGC web services not web resources at all, or just broken ones that might be patched up with appropriate representations and HTTP status codes? The former, I think: the OGC service architecture originated apart from the Web, and although the web is the primary transport (middleware, as Paul Prescod says, or "DCP" in OGC terms) nowadays, the "Online Resource URL" isn't really a universal identifier in the webarch sense. That's the source of the disharmony among entities in the WMS picture above.

Comments

Re: Services and web resources

Author: Ron

In speaking, I tend to call data sources "services," and information sources "resources." (I know these aren't rigorous at all.) Let's say I provide a service that returns street addresses in a zip code. It's probably going to return too many lines for any effective representation in human terms. That, I would call a "service." It is basically a data-oriented response. Format is documented so that it is usable (XML or json or RonText or whatever,) but it is not a representation by any human measure. It's just service returning data in a format. Now, a resource (in my mind, anyway) makes some attempt to tune the data for human consumption -- search and sort, paginated html pages, Web 2.0 schnazifications, an adaptation for the tiny screen, or SVG animations -- something to render the data into information for humans. Perhaps, in Bateson's terms, a "service" just reports a difference; in order to provide a "resource," you have represent a difference that difference makes.

Oops, that was an illustration!

Author: Ron

Let me try it again with a little structure:

In speaking, I tend to call data sources "services," and information sources "resources." (I know these aren't rigorous at all.)

Let's say I provide a service that returns street addresses in a zip code. It's probably going to return too many lines for any effective representation in human terms. That, I would call a "service."

It is basically a data-oriented response. Format is documented so that it is usable (XML or json or RonText or whatever,) but it is not a representation by any human measure. It's just a service returning data in a format.

Now, a resource (in my mind, anyway) makes some attempt to tune the data for human consumption -- search and sort, paginated html pages, Web 2.0 schnazifications, an adaptation for the tiny screen, or SVG animations -- something to render the data into information for humans.

Perhaps, in Bateson's terms, a "service" just reports a difference; in order to provide a "resource," you have represent a difference that difference makes.

Re: Services and web resources

Author: Sean

Resources provide pages for human agents, services provide data for computational agents? The architecture of the Web (not to mention the Semantic Web) does not make this distinction. It's all resources. Text resources, image resources, audio resources, data resources. The audience of a resource (modulo authentication, authorization, and language) is determined by the content types of its representations.

Re: Services and web resources

Author: Andrew Turner

The WxS is actually really close to what would be a good operational model here. If you dereference the URI to the 'service', it should return a 200OK and instead of an Exception could return the GetCapabilities document. This way the "resource" is the description of the map (just not in a pretty picture way, but in a we have a map with these layers in this area with this title, etc.) The capabilities really isn't different from an entity/resource from a PNG or KML. The latter merely contain the actual features, but there is no necessity that a resource directly include all subsequent child resources. An Exception should only be returned if the subsequent query parameters are not valid, and then would be an HTTP 400 "Bad Request". Unfortunately, I assume the fault lies in designing the spec around implementation details (servers being written to say 200 if there wasn't a failure raised)

Re: Services and web resources

Author: Dave Smith

What's referenced is just a base URI, e.g. http://wms.geobase.ca/wms-bin/cubeserv.cgi? - as such, it's an incomplete URI scheme for the resource, and that's why you get the broken response. Obviously you would then have to ask it for something, http://wms.geobase.ca/wms-bin/cubeserv.cgi?request=getCapabilities or... http://wms.geobase.ca/wms-bin/cubeserv.cgi?SERVICE=WMS&VERSION=1.1.3&REQUEST=GetMap&BBOX=-81.758684,46.435561,-74.056742,50.968655&SRS=EPSG:4326&WIDTH=893&HEIGHT=526&LAYERS=DNEC_250K%3AELEVATION%2FELEVATION&STYLES=&FORMAT=image/png&TRANSPARENT=TRUE Those are the full URI schemes. For a RESTful service (or any other kind of service) you would similarly need to pass in parameters, e.g. ask it for capabilities, ask it for an image, identify a feature, and so on. This yields an immense number of permutations, different image sizes, different layers, different styling requests, and so on - and this is why a base URI is not so unreasonable as it provides the base starting point. Perhaps, though, it would make more sense to point to the Capabilities document as that starting point. The other issue is in connecting to it, "lights-out". This is done via a.) OGC standard and b.) Capabilities statement. These things, capabilities and standards are crucial for interoperability and enterprise-oriented approaches. In a vacuum, one could build the greatest, most wonderful service in the world, yet it would do anyone else no good if they don't know how to discover and access it consistently.

Re: Services and web resources

Author: Sean

Dave, I agree with you about discovery and access. Happily, webarch and HTTP/1.1 have this covered, and better than any OGC spec: through URIs, links, and the "follow your nose" discovery that crawlers and search engines can exploit. How is using HTTP not "lights-out" access? It's good enough for your feed reading, your web browsing, your Twitter clients ... even WxS and SOAP use HTTP as transport.

You're misusing the term "URI scheme", which is defined in webarch. Our URI schemes are "http", "ftp", "urn", "info", et al. To assert that the WxS "online resource URL" string is a URI scheme is to create immediate conflict with the architecture of the Web. There would be a profusion of WxS URI schemes, one for each service installation (500 or so), all of them extending the "http" scheme in a non-standard way. Talk about "stovepipes". Remember, too, that WxS services are supposed to support POST requests to the thing at that "online resource URL" for capabilities docs and data. You POST to a resource identified by a URI, you can't meaningfully POST to a URI scheme.

I feel that when you write "URI scheme" you're trying to express concepts related to URI templating. See the IETF URI templating draft for the way to do this right, but understand that even if WxS were to do proper URI templating, its "online resource URLs" would have to identify proper resources for POST's sake.

Re: Services and web resources

Author: Dave Smith

With regard to discovery and access, we can discover and access RSS and ATOM feeds only because they have a defined standard - and even that is not without the occasional wrinkle. Similarly, the Twitter API is documented and defined. But you certainly wouldn't be able to immediately figure out how these work without first knowing at least a little bit about the feed/API and its parameters - hence capabilities and standards. What I mean by "lights-out" access is being able to programmatically discover and access with little more than a handful of predefined rules - as opposed to always making a human read docs unique to each API or feed, and write custom code for integration. People want to focus on doing science, analysis and solving business problems, and not writing custom code that might break with each change on the far end. I'd agree that OGC isn't quite there on some of your points, but again, it points up the need for consistency.

Re: Services and web resources

Author: Sean

It doesn't follow from my criticism of WxS that I am against standards in general. I'm strongly in favor of good protocol and format standards; "good" to me meaning that something works well with our global information infrastructure (also known as the "Web"). In this sense, WxS is not so good, though its formats are better than its protocols.

Mocking GEOS

My use of mocks isn't as sophisticated as Dave's, perhaps, but I stumbled onto a simple testing pattern that might be useful to other Python geospatial/GIS developers who are wrapping C libs using ctypes.

Consider Shapely: it wraps the GEOS library, the quality and accuracy of which we take as a given (though not blindly, because I do contribute fixes and enhancements to GEOS). The predicates and topological functions of GEOS are called from within Python descriptors, classes that perform argument validation and handle GEOS errors. For Shapely, I'm testing these descriptors, the GEOS wrappers, not GEOS itself. What pair of geometries would I have to pass to GEOSDisjoint (for example) in order to get the return value of 2 that signifies an error? Even if known, they might be subject to issues of numerical precision, or be sensitive to changes in GEOS. I'd rather not fuss with this. Instead, I want some function to stand in for GEOSDisjoint and friends, one that takes 2 arguments and has very predictable return values in the range (0, 1, 2). A function like libc's strcmp():

>>> import ctypes
>>> libc = ctypes.CDLL('libc.dylib') # this is OS X
>>> libc.strcmp('\0', '\0')
0
>>> libc.strcmp('\1', '\0')
1
>>> libc.strcmp('\2', '\0')
2

Meaningless, but handy, isomorphism between strcmp() and GEOS binary operations in hand, a generic wrapper for GEOS can be fully tested like this:

import ctypes
import unittest

from shapely import predicates

BN = libc.strcmp

class CompMockGeom(object):
    # Values chosen with libc.strcmp in mind
    vals = {'0': '\0', '1': '\1', '2': '\2'}
    def __init__(self, cat):
        self._geom = ctypes.c_char_p(self.vals[cat])
    comp = predicates.BinaryPredicate(BN)

class BinaryPredicateAttributeTestCase(unittest.TestCase):

    def test_bin_false(self):
        g1 = CompMockGeom('0')
        g2 = CompMockGeom('0')
        self.assertEquals(g1.comp(g2), False)

    def test_bin_true(self):
        g1 = CompMockGeom('1')
        g2 = CompMockGeom('0')
        self.assertEquals(g1.comp(g2), True)

    def test_bin_error(self):
        g1 = CompMockGeom('2')
        g2 = CompMockGeom('0')
        self.assertRaises(predicates.PredicateError, g1.comp, g2)

Comments

Re: Mocking GEOS

Author: ajfowler

Hi, Unrelated to this post, but I came across one of your old blog posts about web-mapping accessibility. Is this topic off of your radar now? aj

Re: Mocking GEOS

Author: Sean

Yes, but it looks like it's on yours. What's up with web map accessibility?

Re: Mocking GEOS

Author: ajfowler

Well I'm looking into creating a text description of a map. There aren't a lot of resources out there, but I'm avidly searching.

Toward Shapely 1.1

Over the holiday I created a 1.0 branch for Shapely and began working toward Shapely 1.1. The next release will have the same API, but with some new and improved implementations of the same classes and methods, and a few new features. So far, I've managed to cut the code base by about 6% (the less code, the better, I say), not including the new tests written to get coverage to 97%:

Name                               Stmts   Exec  Cover   Missing
----------------------------------------------------------------
shapely                                0      0   100%
shapely.array                         14     12    85%   22-23
shapely.deprecation                   13     13   100%
shapely.factory                      195    195   100%
shapely.geometry                       8      8   100%
shapely.geometry.base                265    265   100%
shapely.geometry.collection           12      8    66%   25-28
shapely.geometry.geo                  48     48   100%
shapely.geometry.linestring           55     55   100%
shapely.geometry.multilinestring      44     44   100%
shapely.geometry.multipoint           82     82   100%
shapely.geometry.multipolygon         59     59   100%
shapely.geometry.point                92     92   100%
shapely.geometry.polygon             177    177   100%
shapely.geometry.proxy                31     31   100%
shapely.geos                          59     32    54%   12-24, 29-38, 40, 45-51, 84, 90-91
shapely.iterops                       30     30   100%
shapely.ops                           24     24   100%
shapely.predicates                    35     35   100%
shapely.topology                      29     29   100%
shapely.wkb                           22     22   100%
shapely.wkt                           22     22   100%
----------------------------------------------------------------
TOTAL                               1316   1283    97%
----------------------------------------------------------------------
Ran 196 tests in 0.836s

FAILED (errors=1, failures=1)

Feel free to grab the new code from its Subversion repository:

$ svn co http://svn.gispython.org/svn/gispy/Shapely/trunk Shapely

By the way, I've used git and git-svn exclusively for my work since the 1.0/1.1 branching, and am becoming a fan.

I remain unenthusiastic about implementing heterogeneous geometry collections. I never use them ... what am I missing?

Comments

Re: Toward Shapely 1.1

Author: Martin Daly

You will be missing completeness, and compatibility with Simple Features. For example, what is the union of a Point and a LineString, if not a GeometryCollection? You might argue that that is a poor-ish example because you can just say "I won't allow that". A better example would be the intersection between LineString-s that are colinear for some portion of their length, and cross at some other point. The result of that - according to Simple Features - is a LineString and a Point, and it would be hard to disallow intersections between LineString-s just because you don't like the resulting geometry type.

Re: Toward Shapely 1.1

Author: Sean

Okay, thanks Martin. Can you think of an example of how heterogeneous geometries might be part of the same single feature (sharing the same attributes) in a GIS?

Re: Toward Shapely 1.1

Author: Martin Daly

You mean something in practice, not theory? I'm out :) I can't remember having seen an example in data that we have been given. I could probably make something up, but it would be just that: made up. Of course Shapefiles have no provision for geometry collections, so they don't actually exist, right?

So where's the git repo?

Author: Holger

So, where is the public git repository for all of us who don't want to have to learn yet another version control system, just to get to your subversion repo?

Re: Toward Shapely 1.1

Author: Sean

No need to learn svn:

$ git svn clone http://svn.gispython.org/svn/gispy/Shapely/trunk Shapely

Re: Toward Shapely 1.1

Author: Sean

Alright, I'm sticking with heterogeneous collections which might be incidental products of operations, but are otherwise discouraged. And we've reached a limit of code coverage. I can only get to 100% now by fooling the test runner into believing that it's on different platforms that have or do not have numpy.

Name                               Stmts   Exec  Cover   Missing
----------------------------------------------------------------
shapely                                0      0   100%
shapely.array                         14     12    85%   22-23
shapely.deprecation                   13     13   100%
shapely.factory                      195    195   100%
shapely.geometry                       8      8   100%
shapely.geometry.base                265    265   100%
shapely.geometry.collection           12     12   100%
shapely.geometry.geo                  48     48   100%
shapely.geometry.linestring           55     55   100%
shapely.geometry.multilinestring      44     44   100%
shapely.geometry.multipoint           82     82   100%
shapely.geometry.multipolygon         59     59   100%
shapely.geometry.point                92     92   100%
shapely.geometry.polygon             177    177   100%
shapely.geometry.proxy                31     31   100%
shapely.geos                          59     32    54%   12-24, 29-38, 40, 45-51, 84, 90-91
shapely.iterops                       30     30   100%
shapely.ops                           24     24   100%
shapely.predicates                    35     35   100%
shapely.topology                      29     29   100%
shapely.wkb                           22     22   100%
shapely.wkt                           22     22   100%
----------------------------------------------------------------
TOTAL                               1316   1287    97%
----------------------------------------------------------------------
Ran 196 tests in 0.849s

FAILED (errors=1, failures=1)

Open access to National GIS data

A corollary to Jeff Thurston's grammatically challenged geospatial thought for the day:

Let’s be clear: If government pays for geodata, then makes it available for free. Then it is not free. You ARE paying for it.

is this:

If you're paying for it, you own it, and should have the right to unfettered access to unclassified portions of it.

The National Institute of Health mandates open access to the published results of science it funds. Similar open access to all publicly funded research is currently the 12th ranked suggestion to Obama's future CTO. An equivalent policy for National GIS data is in my opinion, a must. I don't mean access to a service endpoint, I mean access to shapefile downloads.

I believe I will write my new Senator, Mark Udall (do I ever love typing that phrase!), and see if he's interested in doing something about it.

Update (2009-01-16): related, more thoughtful post here.

Update (2009-01-28): more from Sean Gorman and Paul Ramsey.

Comments

Re: Open access to National GIS data

Author: Kirk

I don't think I'd like for the public to have access to precise locations of archaeological sites, would you?

Re: Open access to National GIS data

Author: Eric Wolf

I guess people don't read before they make suggestions. The Obama platform specifically cited increased access to Government information as an important goal of his administration. And open access is generally the norm at the USGS. I believe it is by law that USGS-collected data cannot be copyrighted and is free (libre) for any use. Unfortunately, there are so many snafus related to the free (gratis) problem that the bureaucrats get stuck in a tailspin. The past eight years, the Department of Interior has operated under a mantra of "we must become like a commercial operation" because, as we all know, the market is always right... right? We also have the technical issue of USGS data is not in shapefile format because of the magnitude of the data and the diversity of the data types. Most of the data is stored as custom geodatabases, sometimes centralized but frequently distributed. Providing service endpoints is easier than shapefiles, especially for the centralized geodatabases - all we have to do is front-end the database with the appropriate protocol. The Seamless server (http://seamless.usgs.gov/website/seamless/viewer.htm) already provides shapefile downloads. But because of the way the data is stored at the USGS, it must first be extracted from the databases and then turned into a shapefile. The debate, really, is: would you rather the USGS spend your tax dollars maintaining a database structure (i.e., independent shapefiles like "transportation for Colorado") that doesn't fit the Survey's own internal needs for its mission of furthering environment science? In the past, the USGS charged for data delivery to help compensate for this difference between internal and external data format needs. Of course, if you take this to the next step, you get the FGDC and SDTS. I won't embarrass myself by going there... I'd suggest, in addition to righting Udall, also CC: that other famous Colorado politician, Ken Salazar, the incoming Secretary of the Interior. Salazar's role is in interpreting administrative guidelines into policy for the USGS and the rest of the DOI.

Re: Open access to National GIS data

Author: Sean

Kirk, as far as I'm concerned that's another kind of classified. Moot here, because archaeology and cultural heritage isn't part of the National GIS proposal, but the same issue does come up in regard to wildlife habitat. There are people who might bring on the bulldozers upon discovering that their property intersects with endangered species habitat.

There is a mind-blowing cave near my old hometown, Logan, Utah. As a kid I went in there a bunch of times. Increasing numbers of visitors, some who camped inside, built fires, etc, made life hard for Townsend's big-eared bats. The Forest Service tried some seasonal closures to protect the bat population, and some hillbillies (who are probably related to me -- this is Utah, after all) responded by trying to eliminate the bats. The cave is now gated, and closed. Sadly, I don't think this kind of vandalism is particular to the Intermountain West.

The paranoid may say parcel data likewise needs to be kept out of the hands of evil-doers, but I think this is bogus.

Thanks for the Salazar reminder, Eric. I busted my ass for him in 2004, and he owes me a favor ;)

Re: Open access to National GIS data

Author: Dave Smith

There are quite a few different reasons why access might be controlled - not just sensitivity due to national security, archaeological or natural security, but also others - e.g. governmental regulation on business may make government privy to information about a company's business processes, suppliers, and so on - which might otherwise by confidential trade secrets. However, I tend to think that the datasets which genuinely require sensitivity are the exception to the rule. The vast majority should be open and accessible. However, another consideration is that many governmental entities also face unfunded mandates which dictate that they collect and manage data. How to pay for it? Charge users, is one model, unfortunately. Or... don't collect the data at all. Or... rob Peter to pay Paul, and borrow a little funding from another program and get the most basic data collected, which in turn, might not be in an easily-sharable form. Many obstacles. Should USGS be maintaining data outside of their own mandate? Probably not. But meanwhile, can they access said data from DOT/FHWA or other sources in a seamless fashion? Heck no. So everywhere across government, we have all these disconnected little stovepipes, which without the rest of the background data, would generally be of limited utility. FGDC, "GIS for the Nation", GOS, OMB Geospatial Line of Business and all of these should be pursuing a national FRAMEWORK for providing this - they have accomplished a few things here and there, but the technical architecture is still sorely lacking. And without sound guidance, governance, and a solid national architecture and framework, the Dangermond proposal could seriously threaten to only propagate the same type of thing. Who manages and houses what? How is the data to be published, discovered and accessed? Technology is not the hurdle. The hurdle is cultural.

Re: Open access to National GIS data

Author: Sean

Eric, did you bring up the Seamless app as an example of how data should be shared? It is so wrong, in so many ways. I'm not counting on the usual suspects to deliver anything better for a national GIS, and that's why I'm saying the USGS should just release the data periodically and let others remix it into useful services.

Re: Open access to National GIS data

Author: Kirk

Sean, I hadn't really thought about wildlife data being classified, but see what you mean. I live not far from Bracken Cave, which seems fairly well protected by BCI ... http://tinyurl.com/brackencave. Notice how "find bat locations" takes you to a page that tells you everything about the cave and its bats - except for the location. I was thinking more about the antiquities databases you build tools for. Does ISAW try to discourage treasure hunters from gaining access?

Re: Open access to National GIS data

Author: Tyler Erickson

There was a fairly good keynote talk related to this subject last December at the AGU Fall Meeting, an academic scientific conference that draws 15,000+ attendees. Michael Jones of Google spoke on spreading scientific knowledge, and one of his main points was that all government funded research should require open publishing of the work (data, source code, and results) so that others can easily reproduce and build upon it. The talk seemed to be well received, given that most of the audience members are dependent on government funding for the majority of their research. At least I hope that they see the big picture and agree with it: if everyone gives away their one precious dataset/algorithm, everyone will have access to thousands of new datasets and algorithms to for used in their own research.

Re: Open access to National GIS data

Author: Sean

Kirk, I do think that location obscurity is the very least that digital antiquities people should provide for sensitive archaeological sites that can't be better secured. ISAW projects are different: we aggregate and provide tools for study of already known places, people, and texts. Databases of hidden treasures aren't part of our mission. Ideally, our workflow engine and editorial board publish no material before its time, but that's principally to maintain a high standard of scholarship.

Re: Open access to National GIS data

Author: Dave Smith

Carrying over discussion from Sean Gorman's site - The issue with just providing data (e.g. shapefiles) is that they require download/conversion/etc... a process. In this process, how often do you update? Do you have/provide adequate metadata to know whether or not it's most-current data? Do then you need to build a refresh process, to schedule a mechanism to perform the download and update on your end? Are there going to be dozens of other stakeholders all making redundant investments in the same type of refresh processes? With Apps for Democracy et al, it was beyond just "data" but specifically directly-mashable data feeds - and this can be a means of providing and ensuring currentness, via KML network links, live GeoRSS feeds et al. Part of my concern is in economies of scale (why not build it once, use it many times) and in potential liabilities, e.g. folks who might not be dilligent in routinely updating the datasets that feed their apps. Easiest solution would be to just publish a live feed. Have agencies provide direct data access via KML network link, GeoRSS, WxS services, tile services, e.g. GeoServer. With a modicum of infrastructure planning, this could be quite scalable and robust, and serve a vast majority of need across the entire community. And, the data would reside in-place with each steward, in a federated NSDI. This is basic stuff, not complicated star-wars physics. The flipside of the equation is in data collection efforts - e.g. EPA's Exchange Network, which collects data from all 50 states, tribes and other participants. Or... you have OAM, great idea for crowdsourced data, but what happened here?- again, infrastructure crunch, needing sponsorship and funding. "Just do it" is all fine and good, but definitely has its practical limits, particularly when dealing with an entire national dataset and applications which require cross-agency and inter-agency data. With respect to obscuring data, touch base with NatureServe - they are working on ways to allow site screening for sensitive/endangered species without exposing the actual location.

Re: Open access to National GIS data

Author: Sean

I invited trouble by reducing my desire for excellent, standardized, syndicated data to "shapefiles". I am in favor of funding agencies to create, manage, publish (using simple and robust mechanisms like RSS), and curate this data. My only objection is to the proposed shiny service architectures and portals; the GIS industry/community rarely gets that stuff right.

Re: Open access to National GIS data

Author: Eric Wolf

Sean: I brought up Seamless not as an example of how data should be served, but an example of how the USGS is actively trying to come up with a scheme for providing the diverse range of data it collects, creates and maintains to a diverse user base. Essentially, the problem is similar to the Census DIME and TIGER files. The Census gives you dumps from the database and a schema to help you decode the data dump. The problem is the USGS doesn't have one database. We have many. And the larger databases are comparable in size and complexity to the Census data. And unlike the Census which is really only updated once every decade, many of the USGS databases are updated in real-time. I'm not trying make excuses. I'm trying to help you understand the challenges. My colleagues and I in CEGIS at the USGS are actively trying to understand how to best manage data dissemination. So we appreciate being told what is wrong with what we are doing and what people actually want.

Re: Open access to National GIS data

Author: Dave Smith

Eric raises another point - EPA has similar flows, e.g. FRS, where the data it contains comes from a large number of disparate stewards, and which, based on varying practices and standards in place with external stewards, may have a host of issues when it arrives, e.g. mismatched datum, reversed lat/long, signs on longitude values, and so on- further, representation of the "place" may mean very different discrete things - e.g. water outfall, air stack, front gate outside of a plant, and so on, along with other issues which need to be harmonized in order to provide a seamless national dataset of regulated facilities. And as with the USGS database, these are refreshed on a continual basis. As such, there are hurdles to be overcome before even turning over the data, and that's been half the battle. However, once the data can be gotten to this point, the solutions for delivery become a lot more straightforward, at least in today's terms. It should also be considered that, for example, EPA's web-based GIS applications began life in the 1990s, when current technologies and architectures were not yet conceived, with many pieces scratch-built. Many functionalities can and are being replaced for more current technologies - however again, availability of resources has been an issue. Dealing with complex processes, legacy systems and disparate resources across and outside an enterprise is never as easy as building something new. But hopefully existing efforts and technologies, such as GeoServer can be employed to provide robust, low-cost infrastructure to serve these types of needs in the future.

Links in content

One of the key constraints of a RESTful web services is adherence to hypertext as the engine of application state, or as Bill de hÓra says, links in content. AtomPub has this: service documents link to collections; collections link to entries; entries link to their editing resource. Why? For resiliency, evolvability, and longevity. Links in content allow clients and servers to be decoupled; an agent can follow its nose into the service and its contents, and need not be compiled against the service. The service is more free to add resources, proxy/cache resources, move resources, phase out resources. In theory, the properties of resiliency, evolvability, and longevity are products of the hypertext constraint. This theory is continually tested, and mostly validated, day after day, year after year, on the Web. Roy Fielding wrote in a comment on his blog:

REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. Many of the constraints are directly opposed to short-term efficiency.

If your services aspire to the level of infrastructure, links in content is a better architectural style than one where all clients break when the API changes, or that demand a client upgrade to get access to any new capabilities.

Service developers often mistake hierarchical URIs with the hypertext constraint. An API with URIs like http://example.com/api/food/meat/red looks clean, but unless there's a resource at http://example.com/api/food/meat/ that explicitly connects – whether using links, forms, or URI templating – clients to the resource at http://example.com/api/food/meat/red (and it's sibling "white"), it's only a cosmetic cleaning. The API might as well use http://example.com/api?tag=food&tag=meat&tag=red. I pointed out the lack of links in the very handy New York Times Congress API on Twitter and got a response from a developer. I assert that for the API to be RESTful, there should be links to subordinate "house" and "senate" resources in the response below instead of a server error:

$ curl -v "http://api.nytimes.com/svc/politics/v2/us/legislative/congress/103/?api-key=..."
> GET /svc/politics/v2/us/legislative/congress/103/?api-key=... HTTP/1.1
> Host: api.nytimes.com
>
< HTTP/1.1 500 Internal Server Error
< Content-Type: application/xml; charset=utf-8
< Content-Length: 279
<
<?xml version="1.0"?>
<result_set>
        <status>ERROR</status>
        <copyright>Copyright (c) 2009 The New York Times Company.  All Rights Reserved.</copyright>
        <errors>
                <error>Internal error</error>
        </errors>
        <results/>
</result_set>

One of the best examples of links in geospatial service content is ESRI's ArcGIS sample server. It's entirely navigable for an agent such as a web browser. Agents that follow the links in the content can easily tolerate addition and deletion of services, or their move to new URIs. See also the JSON representation of that same resource, http://sampleserver1.arcgisonline.com/arcgis/rest/services/?f=json:

{
  "folders": [
    "Demographics",
    "Elevation",
    "Locators",
    "Louisville",
    "Network",
    "Petroleum",
    "Portland",
    "Specialty"
  ],
  "services": [
    {
      "name": "Geometry",
      "type": "GeometryServer"
    }
  ]
}

The service doesn't make it clear enough that the items in the "folders" and "services" lists are the relative URIs of subordinate resources, but that's clearly the intention. Nevermind that the ArcGIS REST API is layered over SOAP services; it's very close to getting the hypertext constraint right and worth emulating and improving upon. ESRI is astronomical units beyond the OGC in applying web architecture to GIS. (Note: the JSON format itself has no link constructs, so JSON APIs are on their own. The lack of a common JSON linking construct is a big deal. As I've mentioned before, it prevents GeoJSON APIs from being truly RESTful.)

As Fielding points out, constraining clients to crawl your service, instead of compiling against it, can have a performance cost. On the other hand, clients are welcome to optimize by caching the structure of a service for a time/value specified by the server, using the expiration and validation mechanisms built into HTTP/1.1. The extra cost of crawling need not be paid any more often than necessary.

Finally, consider that you might not even need REST in your API. Seriously, you might not need it. Not every service needs to span many organizations, or support dozens of different clients. Not every service needs to be around for 10, 15, 20 years.

I'm eager to see if the touted GeoCommons API has the hypertext constraint. I'm almost certain it will declare itself to be "RESTful", both because of the zeal of the company's marketing folks, and because its CTO, Andrew Turner, is honestly big on web architecture. If it does, it would be taking a step towards becoming a real spatial data infrastructure.

Update (2009-01-15): I just remembered that Subbu Allamaraju has a related and much more detailed article on describing RESTful applications.

Update (2009-01-20): links are already a requested Congress API feature.

Comments

Re: Links in content

Author: Sean

Right. A GeoJSON API can invent its own linking construct, like ESRI did, but there's a risk of getting it wrong or, at the very least, too different. Until there's a standard, or at least consensus, JSON APIs will tend to be like snowflakes. And again: API developers should consider whether they need REST. An HTTP API that uses GET/POST properly and supports expiration/validation (and maybe even paging) is worthy enough. Hierarchical URIs like the ones of the Congress API do let you add in the hypertext constraint for full REST if you want it, and so have a non-cosmetic advantage, after all.

Re: Links in content

Author: Keyur

KML network links is a great example as well.

linking

Author: Ian Bicking

I don't think GeoJSON is necessarily any more lacking in linking than, say, Atom. XML doesn't have any native sense of a "link", but Atom does -- if you use the link tag, you are creating a link (when using some extension tag, though, it's unclear if you are linking or not). It's a link because the Atom specification (itself built on the XML syntax) defines it as a link. Similarly GeoJSON builds on the JSON syntax, but any linkyness is based on the GeoJSON specification.

Re: Links in content

Author: Sean

Yes, Ian, but we punted when faced with specifying links for GeoJSON; it doesn't have them.

Re: Links in content

Author: Andrew Turner

Indeed, this goes back to the topic from quite a bit ago, and also why I'm such a fan of OpenSearch. It's simpler and more broadly applicable than WADL while giving simple links to the broadest use of any API. And results from there linking to individual resources and methods. The NavigatingWashington site is primarily just embeds, with just a little tinge of what we're doing with the API.

GIS consultancy stimulus proposal

US $1,200,000,000 is a lot of GeoPork. While I think the proposed data sets are pragmatic enough (maybe substitute climate/environmental data for wildlife data), I can't get behind any proposal for this level of public funding that doesn't explicitly put open data in the public's hands ("publicly-accessible" is far too vague). And I shudder to think of $450,000,000 spent to line SOA's casket.

Via APB.

Update (2009-01-13): "shudder to think" is a rather lame cliche that I regret using. Apologies, my dear readers. I really feel more like Captain Blackadder reading orders from General Melchett – which means intense, visceral shuddering, waves of violent fear and loathing, and a hyperbolic analogy invoking unfair stereotypes of the French.

OpenLayers and Djatoka imagery

Hugh Cayless has written an OpenURL image layer for OpenLayers that pulls imagery from Djatoka. I'm eager to see it in action. I've heard other library folks talking about doing this kind of thing with "GeoPDF"; my hope (I'm not speaking for Hugh or UNC) is that they'll take a look at this kind of non-proprietary solution before they do.

More decoration

Christopher Schmidt explains the traditional approach to wrapping functions and methods, one I use regularly; Python's built-in property function, as a decorator, produces read-only properties, but can provide read-write property access when used traditionally.

Are decorators merely cosmetic? I'm of the opinion that some syntaxes are better than others. You're likely to agree that:

>>> 1 + 2
3

is more concise, readable, and intuitive than

>>> int(1).__add__(2)
3

but may not agree that Python's decorators are a syntactic improvement. PEP 318 was hotly debated, but is final; decorators are in, and they'll be expanded in 3.0.

The motivation for decorators is compelling:

The current method of applying a transformation to a function or method places the actual transformation after the function body. For large functions this separates a key component of the function's behavior from the definition of the rest of the function's external interface. For example:

def foo(self):
    perform method operation
foo = classmethod(foo)

This becomes less readable with longer methods. It also seems less than pythonic to name the function three times for what is conceptually a single declaration. A solution to this problem is to move the transformation of the method closer to the method's own declaration. The intent of the new syntax is to replace:

def foo(cls):
    pass
foo = synchronized(lock)(foo)
foo = classmethod(foo)

with an alternative that places the decoration in the function's declaration:

@classmethod
@synchronized(lock)
def foo(cls):
    pass

Even if calling code isn't exactly broken, wrapping a function more than likely changes the function's signature in some way; keeping all signature specification (such as it is in Python) at the head of a function is a good thing and requires some syntax like that of PEP 318. GIS programmers who've come to Python in the past several years via ArcGIS should get with @. If you can't or won't, that's fine too; there's another way, as Christopher shows.

On "prettier code": all else being equal, prettier code is more readable code. It's code that can teach, that can be more easily modified by others. In some ways, better code.

One downside of the decorator syntax: ability to test decorators in a doctest eludes me. The following:

def noisy(func):
    """
    >>> @noisy
    >>> print foo()
    Blah, blah, blah
    1
    """
    def wrapper(*args):
        print "Blah, blah, blah"
        return func(*args)
    return wrapper

@noisy
def foo():
    return 1

fails:

Exception raised:
    Traceback (most recent call last):
    ...
       @noisy

    ^
     SyntaxError: unexpected EOF while parsing

Could be ignorance on my part.

Comments

Re: More decoration

Author: Christopher Schmidt

def noisy(func):
    """
    >>> @noisy
    ... def foo():
    ...     return 1
    >>> foo()
    Blah, blah, blah
    1
    """
    def wrapper(*args):
        print "Blah, blah, blah"
        return func(*args)
    return wrapper

Re: More decoration

Author: Sean

Moving foo inside the docstring doesn't help. It was being found by doctest before.

Re: More decoration

Author: Christopher Schmidt

I guess I don't know what you're trying to do. I'm not trying to test 'foo', I'm trying to test 'noisy'. So I define a 'foo' that uses 'noisy', and I test that 'foo' does what I want. (In this case, foo does nothing except return '1'; this would still be the case even if the real 'foo'.
disciplina:~ crschmidt$ python foo.py -v
Trying:
    @noisy
    def foo():
        return 1
Expecting nothing
ok
Trying:
    foo()
Expecting:
    Blah, blah, blah
    1
ok
1 items had no tests:
    __main__
1 items passed all tests:
   2 tests in __main__.noisy
2 tests in 2 items.
2 passed and 0 failed.
Test passed.
So, I guess I don't know what you're trying to do.

Re: More decoration

Author: Sean

Ah, I finally see what's up. My original docstring text
  >>> @noisy
  >>> def foo():
  ...
is invalid Python, which is obvious if you type it into a prompt:
  >>> @noisy
  ...
Thanks for the help, Christopher. And yes, best to define the foo mock within noisy's docstring test.