Speaking of the open web, here's Mark Pilgrim's take on HTML5 geolocation:
Geolocation is the art of figuring out where you are in the world and (optionally) sharing that information with people you trust. There are many ways to figure out where you are — your IP address, your wireless network connection, which cell tower your phone is talking to, or dedicated GPS hardware that receives latitude and longitude information from satellites in the sky.
You can also pick your location, or any other location at all that suits your needs, from a map using René-Luc's Firefox Geolocater.
Apparently, a lot of the "GeoWeb" is made of bluelegos. Say what you will about HTML and SVG, but open web stuff comes less and less with these kind of nasty surprises and titanic games of chicken thundering around your business.
An email this morning reminded me that my recent post on web linking (a rehash of threads on rest-discuss and geo-web-rest) also covered a lot of the same ground Andrew Turner did in http://highearthorbit.com/geoweb-standards-discoverability/. They are written for different audiences, but ones that overlap a bit, and go pretty well together.
On the descending portion of the hype cycle now it seems that, like a guy
in a "Rock Star" t-shirt, a "REST API" most likely isn't. It might be using
HTTP as a uniform interface and identifying things with URIs, but then you find
it provides text/xml or application/json responses with no links and
out-of-band rules for teleporting (you can't call it traversing) to other
parts of the API. Tight coupling like that is not what REST is about.
One that's getting very close is GeoServer's Configuration API. It has links
from workspaces to datastores to layers, and a non-HTML client should in theory
be able to follow them, changing the configuration state of the service in a
step-by-step manner, led by the service itself, much in the same way you would
through a web browser. All from one bookmarkable URI. This is what REST is
about.
I say "in theory" because the GeoServer API doesn't hold water for formats
other than HTML. Here's the problem: given a bookmarked URI ending in
"workspaces" like http://example.com/workspaces, how does a client determine
that this URI identifies a resource to which you can POST a new workspace and
begin the configuration process using in-band information only? If you're
working with a text/html representation of the resource, you'll be shown a
form, and away you go, RESTfully. The semantics of forms, and specifically that
submitting one sends data to certain URI, are defined in the text/html media
type standard. A client doesn't need any out-of-band information: the form is
in the representation, the semantics are specified by the standard "text/html"
value of Content-Type header, both in-band. Now, if the server sends you back a
text/xml response, there's no way for a client to know only from in-band
information how it is to act on the response. That it's a certain type of
resource (a GeoServer Workspace) because the URI ends in "workspaces" and the
representation has a root <workspaces> element? That's out of band. That the
bookmarked URI is a "GeoServer workspace bookmark"? That's out of band too.
AtomPub, on the other hand, holds water because the POST-ability of service
resources (for creating new collections) is standardized under the media type
"application/atomsvc+xml". If a client GETs a URI and that format comes back,
the POST-ability is communicated, in-band. The "application/atom+xml" media
type does the same for collections and entries, especially in its specification that an "edit" link tells the client via which resource it modifies entry and collection state. Standardizing on Atom and AtomPub, if you
can, is therefore a good bet.
The interesting thing about REST that distinguishes itself from other styles is
that interaction is driven by in-band information. Loose coupling,
evolvability, and longevity are properties of a system that has the hypertext
constraint. To get these properties, GeoServer and other APIs need to eliminate
the out-of-band communication. Standardize on media types like HTML or Atom,
mint their own media types (application/vnd.geoserver+xml or some such), or use
links with standard relations in HTTP headers (aka Web Linking) and push for
client support of those.
Thanks for the review Sean, our goal is to make GeoServer as RESTful as possible (indeed when we have the time we'd like to do REST feature access alternative to WFS).
Practically I'm still not sure of the best way forward. Atom does seem better than text/xml, but even if we did that wouldn't we still want to have text/xml representations of resources? Or are you advocating replacing all the text/xml responses with Atom/AtomPub?
As for application/vnd.geoserver+xml - isn't that out of band in its own way? Like developing a client against it you'd still need to know something about that format? Or you're saying it'd just be a better self-documenting one? I'd be interested in your ideas of what exactly that looks like. And again, would it replace text/xml responses?
As for Web Linking, it looks great but is it even accepted yet? Not that we're opposed to implementing a developing standard and encouraging its adoption, but I think things like feature access through REST are higher priority for us. And the idea with that is we'd just add http headers to our text/xml responses? If you want to help us you could sketch out exactly what headers we should add - I think it's pretty easy to add in extra http headers, so if it's not much of an effort we might be able to do it soon.
Re: In which we go into the weeds for some REST
Author: Allan Doyle
Chris asks "As for application/vnd.geoserver+xml - isn't that out of band in its own way? " -- That was my question, too. I was going to ask about "application/atomsvc+xml" instead.
Sean said
AtomPub, on the other hand, holds water because the POST-ability of service resources (for creating new collections) is standardized under the media type "application/atomsvc+xml". If a client GETs a URI and that format comes back, the POST-ability is communicated, in-band.
Isn't that only because RFC 5023 says it's POST-able? Then RFC 5023 is the out-of-band knowledge.
Re: In which we go into the weeds for some REST
Author: Sean
RFC 5023 isn't out-of-band: it and application/atomsvc+xml and application/atom+xml are part of the fabric of the web.
There is a significant implication in using a special MIME type that is "part of the fabric" of the web to indicate how a client is supposed to interpret content, and the actions it may take as a result. This basically implies that conformant RESTful semantics are only possible within the governance framework of the web "fabric" - its not open to application domains to define semantics or behaviour of APIs.
Perhaps application API semantics have to be considered as an out of band (from REST point of view) on top of REST. I.e. REST semantics is an out-of-band part of any application API. This perhaps makes sense, because strong governance (fabric of the web) is useful in an out-of-band context, whereas private application semantics are much more problematic as out-of-band information, because they are hard to discover, formalise consistently, create and interpret.
Re: In which we go into the weeds for some REST
Author: Sean
I really must rethink what I've said about the GeoServer API not holding water after another read of Subbu's post. I've agreed with those who say "application/xml is not the media type you're looking for" (Mark Baker, Jim Webber) if you want to use web links, not XLinks. That link semantics aren't conveyed by the Atom namespace, but by the media type. While that's still my take, I should give Subbu's a try. If you stick to the minimum HTTP protocol (Atom invents a somewhat specialized one) and straight XML processing (Atom has distinctly different processing rules, like "must ignore"), application/xml might be fine. At any rate, I think you'd at least need an actual namespace for workspace and data store elements to make this hold water using application/xml.
No mistake: a configuration API is a great place to start getting into the REST style, and GeoServer is getting close to nailing it.
Re: In which we go into the weeds for some REST
Author: Sean
There's another issue in the GeoServer docs, which document a practice of interaction driven through a fixed URI hierarchy (see Fielding's 4 bullet at http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven). There's a protocol implied there that application/xml doesn't begin to hint at. Better: drive interaction through the links that are already present in GeoServer's workspace (and friends) representations. GeoServer is ready for REST in a lot of ways, but documents a contrary usage that will result in unnecessary coupling.
Clear Climate Code have been working on ccc-gistemp, a project to reimplement in clear Python NASA’s GISTEMP. GISTEMP is a global historical temperature analysis, it produces, amongst other things, graphs like this, that tell you whether the Earth is getting warmer or cooler.
I haven't seen any links from geospatial or GIS blogs to Jeni Tennison's excellent piece about the motivation for choosing the web's architecture as the architecture for the UK's open data initiative and for choosing linked data instead of the usual "services" that make up US data initiatives.
Why?
Because linked data is just a term for how to publish data on the web while working with the web. And the web is the best architecture we know for publishing information in a hugely diverse and distributed environment, in a gradual and sustainable way.
Read it and check out the links to tutorials about creating linked data.
Comments
Re: Linking UK data
Author: Ian Turton
I think that is because all the UK data is just text and excel tables. The OS will give up their data when it's pried from their cold dead fingers, and don't even think about geocoding via a postcode Royal Mail are even worse!
Ian
Re: Linking UK data
Author: Sean
GIS data, rasters excepted, is also largely tabular, wouldn't you say? What's a shapefile if not a table? GML allows different structures, but is less commonly used in that way, and the RDF model is equally suited for those special complex features cases.
Speaking of the OS, it may not giving away coordinates yet, but has interesting and possibly useful linked data at http://data.ordnancesurvey.co.uk/.
I posted a link to the listing of GIS stuff in the Python Package index the
other day and was reminded that there's not quite enough information there for
people looking to match their requirements with software. Click through some of
the links and you may find no distribution at all, or a dependency on some C
library (that would be my work). The descriptions can also be a little vague,
such as "Geospatial geometries, predicates, and operations". Click through the
accompanying link to Shapely and you get a slightly more verbose description:
Shapely is a Python package for manipulation and analysis of 2D geospatial
geometries. It is based on GEOS (http://geos.refractions.net). Shapely 1.0 is
not concerned with data formats or coordinate reference systems.
Responsibility for reading and writing data and projecting coordinates is
left to other packages like WorldMill and pyproj. For more information, see:
Shapely wiki
Shapely manual
Shapely requires Python 2.4+.
That's a little more to go on, but assumes that you're already a GIS
programmer. It's a terrible assumption to be making when you consider how
wedded GIS programmers are to not-open source or not-Python platforms and how
unlikely they are to be trying Shapely. The real audience is Python programmers
coming from outside the GIS business and it doesn't explain to them at all why
they'd want to use Shapely. That's something I'm going to remedy, starting
with a little blog rambling.
Imagine a situation where you'd like to find or index a substring within
another string. Is there "overlap" between the strings, and if so, what is it?
Or maybe you'd like to replace certain characters in a string with others. Now
imagine that you're compelled to load the text strings into a relational
database to perform these operations because such string functions aren't
available in any other context. No knock on the RDBMS, a tremendously useful
thing, but that's a unacceptable situation.
The premise of Shapely, or one of the premises, is that Python programmers
should be able to perform PostGIS type geometry operations outside of an RDBMS.
Another is that Python idioms trump GIS (or Java, in this case, since the GEOS
library is derived from JTS, a Java project) idioms. Shapely, in a nutshell
lets you do PostGIS-y stuff outside the context of a database using idiomatic
Python. I've got a ways to go yet in explaining this because I've left you needing to know what PostGIS-y stuff is.
That's the theory, but are spatial operations outside of a RDBMS all that
interesting in practice? They are to the GIS community: the OGC's Web
Processing Service specification was written to standardize such practice,
and I know a bunch of Python programmers that are gonzo for WPS. Myself, I'm -1
on the WPS standard, but +1 on spatially smart web intermediaries. This is one
of the possible classes of Shapely applications.
My toy Mush is an example of a pipe-like app that uses Shapely. Feeds go in, feeds come out. Check out this
map of intersecting regions around geo-referenced photos in a Flickr feed
produced using Shapely. Look, Ma: no database.
The Web Linking internet draft is in final call. This means that soon
we'll have a standardized registry of web link relation types, rules for
extending the set of registered links, and rules for serializing links in HTTP
headers and/or request and response bodies. The ID also defines what a link is:
In this specification, a link is a typed connection between two
resources that are identified by IRIs [RFC3987], and is comprised of:
A context IRI, and
a link relation type (Section 4), and
a target IRI, and
optionally, target attributes.
A link can be viewed as a statement of the form "{context IRI} has a
{relation type} resource at {target IRI}, which has {target
attributes}."
An IRI, if you don't know, is an Internationalized Resource Indentifier, the
Unicode complement to the URI. The draft uses IRI in its language, but you can
read it as URI or URL without loss of meaning.
I'm not going to blog about every last call, but this one is especially
interesting to me and relevant to the discussion about GIS and Web
architectural styles. If you look at the header on its status page, you see
that the draft is a "Proposed Standard". It would be a standard for the entire
internet, not just a particular business domain. New media can standardize on
it. Library systems can standardize on it. Geospatial systems can standardize
on it. The proposed Web Linking standard has been the context for my writing
and blogging about a where link relation which I'd like to submit for
registration soon – let me know if you recognize yourself as a stakeholder and
we'll do it together.
This last call comes at about the same time Ron Lake wrote the following in an
article partly responding to "some people" who ask where is the web in
"GeoWeb":
Some of the issues revolve around the weak typing and weak semantics of a
hyperlink. In the web of documents this does not matter so much, since this
is a world with a person in the loop. Get the wrong document? Check again.
Much tighter specification of type and semantics is required in the web of
systems, or chaos may result.
His article was illustrated with a (different) image of a staffed switchboard
to emphasize or exaggerate the dependency of the web on human operators. I believe that is in
fact not Andrew Turner at the very back of this one I found on Flickr.
An HTML <img> element is a specialized link with very tight semantics that
is often wrapped, as in the case of the very image above, by a more generalized
link to a home page for the image. What the Flickr resource means to this blog
post is rather underspecified by the link I'm using, but the semantics of the
<img> tag need no human interpreter at all.
Let's consider what links bring to a modern web mapping application in your web
browser. When you use the browser to fetch the HTML representation of a web map
page, it finds among other things HTML <link> elements with rel="stylesheet"
and various <script> elements. A script is a link with extra well-defined
semantics. A web browser "knows" via the processing rules labeled
"text/html" these semantics – that it's supposed to fetch the stylesheet resources identified by
those links using HTTP GET and apply them in rendering the HTML page. Following
other rules in the same "text/html" set, the browser fetches javascript files
and interprets them. That code might create new <script> elements in the DOM,
thereby loading, dynamically, more javascript without any human intervention.
Only after this (in general) does a human enter the loop. That human uses the
javascript UI to choose an area of interest, code creates <img> elements in the
page's DOM (as I wrote before, an <img> is yet another specialized link), and
the browser "knows" once again following others in the same set of rules that
it is to fetch the imagery and render it in the page to show the user. HTML is
full of links with strong semantics and non-human agents use them to great
effect. In not one of those cases there did a human need to judge the semantics
of a link or the type of thing it references. Non-browser web applications can
exploit links in similar ways to accomplish different tasks.
The initial registry for Web Linking includes some fuzzy relation types like
"payment" (indicates a resource where payment is accepted), but also sharper
ones like "previous" and "next". Extension types may be as semantically fine as
necessary. My feeling about a "where" link relation is that it ought to indicate a resource representing the coordinates of the link's context so that it could
be used, with a gazetteer, in place of literal geometries in (for example) an Atom feed:
In practice, the target of the link ought to come in a standard content type
such as RDF/XML, GML, or KML that has well-defined geometries, or as HTML with
an alternate link to a geographically-suited format.
Read the section about links in HTTP headers too: imagine turning legacy GIS
data files into linked data with just a few rewrite rules.
Shapely 1.2a1 has been tagged and uploaded to http://gispython.org/dist so that
people don't get it by mistake from PyPI. To install and try it out (in a
virtualenv):
You'll need a GEOS version >= 3.1.1 to try the new prepared geometry class:
>>> fromshapely.geometryimportPoint,Polygon>>> triangle=Polygon(((0.0,0.0),(1.0,1.0),(1.0,-1.0)))>>> fromshapely.preparedimportprep>>> p=prep(triangle)# pre-analyze for efficient ops>>> p.intersects(Point(0.5,0.5))True
Most of the work toward 1.2 has been done by Aron Bierbaum. Other features
include geometry simplification, a switch to the new reentrant functions in
libgeos_c, setup script consolidation, and more tests.
Being new to this stuff, I'm a big vague on what Shapely is for, examples would be really helpful.
Re: Shapely 1.2a1
Author: Sean
Ian, I'm excited to see you crossing over into geographic applications! Anything I can do, please let me know.
All I've got for Shapely is a manual, a wiki page, and a readme. All of these, admittedly, lack a good "motivation" section (I'm as guilty of falling for the self-evidence of GIS as anyone), which could be reduced to this: Shapely allows you to do PostGIS-y stuff with idiomatic Python outside the context of a database. I should probably just replace all the GIS professional language with exactly that in the readme and wiki.
This looks great, but I think I'm missing something about when prepared geometries should be used.
I modified some code I have that does point on polygon overlays with n points to ~90,000 polygons to use prepared geometries and "time python script.py" tells me the that it actually takes 41.493 sec/point with prepared geoms vs 37.942 sec/point with standard shapely Polygon objects. The quick sample in your post also runs slightly slower with prepared geometries (0.098 sec vs 0.095 sec). When do the benefits of using prepared geoms outweigh the cost of creating them?
Thanks,
Michael
Re: Shapely 1.2a1
Author: Sean
Michael, you're prepping a point for comparison to many polygons? No gain there, I think. The feature is designed for the other case: preparing a complex polygon for comparison to many simpler geometries. We'll make sure that there's adequate documentation of this before a final 1.2.
Re: Shapely 1.2a1
Author: Michael Weisman
I was actually prepping 90,000 polys and leaving the points as standard Point geom objects. I'll play with this some more an see what I can work out.
Python does a lot more for GIS programmers and analysts than just replace Avenue: we're up to 52 distributions tagged Topic::Scientific/Engineering::GIS in PyPI with more untagged ones such as pyGeoDb.
Comments
Re: In which we go into the weeds for some REST
Author: Chris Holmes
Thanks for the review Sean, our goal is to make GeoServer as RESTful as possible (indeed when we have the time we'd like to do REST feature access alternative to WFS).
Practically I'm still not sure of the best way forward. Atom does seem better than text/xml, but even if we did that wouldn't we still want to have text/xml representations of resources? Or are you advocating replacing all the text/xml responses with Atom/AtomPub?
As for application/vnd.geoserver+xml - isn't that out of band in its own way? Like developing a client against it you'd still need to know something about that format? Or you're saying it'd just be a better self-documenting one? I'd be interested in your ideas of what exactly that looks like. And again, would it replace text/xml responses?
As for Web Linking, it looks great but is it even accepted yet? Not that we're opposed to implementing a developing standard and encouraging its adoption, but I think things like feature access through REST are higher priority for us. And the idea with that is we'd just add http headers to our text/xml responses? If you want to help us you could sketch out exactly what headers we should add - I think it's pretty easy to add in extra http headers, so if it's not much of an effort we might be able to do it soon.
Re: In which we go into the weeds for some REST
Author: Allan Doyle
Chris asks "As for application/vnd.geoserver+xml - isn't that out of band in its own way? " -- That was my question, too. I was going to ask about "application/atomsvc+xml" instead.
Sean said
Isn't that only because RFC 5023 says it's POST-able? Then RFC 5023 is the out-of-band knowledge.
Re: In which we go into the weeds for some REST
Author: Sean
RFC 5023 isn't out-of-band: it and application/atomsvc+xml and application/atom+xml are part of the fabric of the web.
For a different take on the subject you should check out http://www.subbu.org/blog/2009/12/media-types-and-plumbing.
Governance of out of band semantics
Author: Rob Atkinson
There is a significant implication in using a special MIME type that is "part of the fabric" of the web to indicate how a client is supposed to interpret content, and the actions it may take as a result. This basically implies that conformant RESTful semantics are only possible within the governance framework of the web "fabric" - its not open to application domains to define semantics or behaviour of APIs.
Perhaps application API semantics have to be considered as an out of band (from REST point of view) on top of REST. I.e. REST semantics is an out-of-band part of any application API. This perhaps makes sense, because strong governance (fabric of the web) is useful in an out-of-band context, whereas private application semantics are much more problematic as out-of-band information, because they are hard to discover, formalise consistently, create and interpret.
Re: In which we go into the weeds for some REST
Author: Sean
I really must rethink what I've said about the GeoServer API not holding water after another read of Subbu's post. I've agreed with those who say "application/xml is not the media type you're looking for" (Mark Baker, Jim Webber) if you want to use web links, not XLinks. That link semantics aren't conveyed by the Atom namespace, but by the media type. While that's still my take, I should give Subbu's a try. If you stick to the minimum HTTP protocol (Atom invents a somewhat specialized one) and straight XML processing (Atom has distinctly different processing rules, like "must ignore"), application/xml might be fine. At any rate, I think you'd at least need an actual namespace for workspace and data store elements to make this hold water using application/xml.
No mistake: a configuration API is a great place to start getting into the REST style, and GeoServer is getting close to nailing it.
Re: In which we go into the weeds for some REST
Author: Sean
There's another issue in the GeoServer docs, which document a practice of interaction driven through a fixed URI hierarchy (see Fielding's 4 bullet at http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven). There's a protocol implied there that application/xml doesn't begin to hint at. Better: drive interaction through the links that are already present in GeoServer's workspace (and friends) representations. GeoServer is ready for REST in a lot of ways, but documents a contrary usage that will result in unnecessary coupling.