Public Service, Public Data, and the Web

Next time I speak to a non-technical audience about the "GeoWeb", I'm going to lean heavily on Paul Ramsey's clever and informative talk (click through for the PDF).

Glossing over the different architectures of the Internet is the right thing to do for that audience, so long as you don't get them hooked with lots of Web examples (like pages 84-93 of the PDF) and then sell them "Web Services" (like WFS on page 94) that aren't actually of the Web and haven't really been a success at getting data into the public's hands. Don't bait and switch, in other words.

Barrington Atlas Feature IDs and Unicode Normalization

Last week I was NYU's Institute for the Study of the Ancient World to plan our Concordia project, an effort to interlink projects like Pleiades, IRCyr, the ANS database, and APIS, and build a traversable, searchable network of data based on Web architecture. I'll be blogging more about this all through the year -- expect my blog to intersect more with the Mapufacture and FortiusOne blogs if they continue on course. Some may even be interesting to mainstream GIS designers and developers. ESRI's implicit embrace of Web architecture (along with the explicit embrace of Google) was one of the big stories out of Where 2.0, after all.

One thing that cropped up in our sessions was the need of other Ancient World projects to be able to refer to Barrington Atlas features in Pleiades by URIs derived from their atlas labels. Tom came up with a template for these URIs:

http://pleiades.stoa.org/batlas/{label-normalized}-{map}-{grid}

and simple rules for normalization that just so happen to be already implemented in plone.i18n. We've forked (in a friendly way: forking is now cool thanks to git, right?) plone.i18n and removed the Zope utilities and all dependency on the Zope component architecture. The result is pleiades.normalizer, and it reduces Barrington Atlas labels which may contain annotation and non-ASCII characters to ASCII strings suitable for use in the URI template:

>>> from pleiades.normalizer import normalizer

>>> list(normalizer.normalizeN(u'Tetrapyrgia'))
['tetrapyrgia']

>>> list(normalizer.normalizeN(u'Timeles fl. '))
['timeles-fl']

>>> list(normalizer.normalizeN(u'*Tyinda'))
['tyinda']

>>> list(normalizer.normalizeN(u'[Agrai]'))
['agrai']

>>> list(normalizer.normalizeN(u'Kalaba(n)tia'))
['kalabantia']

>>> list(normalizer.normalizeN(u'Tripolis ad Maeandrum/Apollonia ad Maeandrum/Antoniopolis'))
['tripolis-ad-maeandrum', 'apollonia-ad-maeandrum', 'antoniopolis']

>>> list(normalizer.normalizeN(unicode('Ağva', 'utf-8')))
['agva']

>>> list(normalizer.normalizeN(unicode('Çaykenarı', 'utf-8')))
['caykenari']

The algorithm normalizes non-ASCII characters (normal form KD) and discards elements which are not letters or digits:

U+011F -> (g, U+0306) -> g

U+0131, the last character in Çaykenarı, is a bit of a troublemaker. Our ASCII "i" has the diacritical mark relative to its dotless latin cousin, the inverse of the usual situation, and we have to make a special exception for it in the code.

If you're getting started in web development with Python you might find pleiades.normalizer handy, or at least a starting point for your own normalization code. We won't be publishing it to PyPI, but you can get an egg via:

$ easy_install http://atlantides.org/eggcarton/pleiades.normalizer-0.1.tar.gz

Tilt!

Placebase's Pushpin API provides a format they call "GeoJSON", which should be good except that it has the wrong coordinate order. Easting and northing are swapped and the data is therefore tilted over, which is bad. See for example

http://rest.beta.pushpin.com/states/CA/counties/alameda.js

The sooner this gets fixed, the better for all of us. Anybody got a line to Placebase?

Update (2008-05-24): The Pushpin response is fixed. I'm excited to see GeoJSON catching on like this.

Comments

Re: Tilt!

Author: jaron

hey Sean- thanks for bringing this up- and thanks to Tim@UM for letting us know .. We're going to get it right and re-release, we'll post again when it's correct. We should have looked at the draft spec more carefully... apologies, we'll get it right...

Re: Tilt!

Author: Sean

Excellent! Thanks, Jaron.

Re: Tilt!

Author: Matt Priour

I think there is still confusion over the proper coordinate order when you actually specify a crs of EPSG:4326 (ie WGS84, Decimal Degrees with Lat Lon order). I got this snippet from the latest trunk version of Feature Server when I pointed it to a shapefile with a specified (not implied) CRS of EPSG:4326.
{"features": [{"geometry": {"type": "Point", "coordinates": [[29.975341, -90.226626]]}, "id": 0, "properties": {...}},...]}
You will notice the Y, X coordinate order which seems like it is against spec, but since I specified a CRS which uses that Y,X coordinate order, it could be right. I was personally +1 for the optional Coordinate Order element proposal for the GeoJSON spec, but I was clearly in the minority.

Re: Tilt!

Author: Christopher Schmidt

Matt: I don't know what version of FeatureServer you're using, but that is definitely wrong. Can you share your shapefile? To me, it looks like it's buggy data; I've never seen OGR give Y,X ordering, which it looks like you're getting, so I'd like to understand how to know when it's happening and fix it.

Re: Tilt!

Author: Matt Priour

@Christopher I looked at my files and you are right. It is the shapefile itself that is wrong. It was created using OGR2OGR with a csv text file and a VRT file. The VRT file incorrectly identified the x & y columns:
<GeometryField encoding="PointFromColumns" x="bg_lat" y="bg_long">
So of course, the GeoJSON coordinates are sent in the wrong order. If I had tried to use a BBOX on this dataset, I would have certainly gotten incorrect results.

Re: Tilt!

Author: Christopher Schmidt

Garbage In, Garbage Out. Bad data begets bad data.

Re: Tilt!

Author: iwei

We've fixed the GeoJSON formatting by reversing the coordinates (applies to our KML version too). Sorry for the inconvenience and thanks for the catch!

Shapely 1.0.5

Shapely 1.0.5 now includes a flexible polygonizer documented in section 2.5.1 of the updated manual, and makes it harder to create a particular class of broken geometries.

Shapely Debs

I'm a big fan of the Debian GIS Project and pleased to see that Shapely is getting some of its attention. Meanwhile, I'm still building GEOS 3.0 from source (with Kai's hexagonit.recipe.cmmi) because Hardy is still at GEOS 2.2.3.

Line Simplification

I just want to point out how well Schuyler Erle's implementation of Douglas-Peucker line simplification plays with Shapely.

>>> from shapely.geometry import Point
>>> point = Point(0.0, 0.0)
>>> outline = point.buffer(2.0, quadsegs=32).boundary
>>> coords = list(outline.coords)
>>> from dp import simplify_points
>>> simple_coords = simplify_points(coords, 0.25)
>>> from shapely.geometry import LineString
>>> simple_outline = LineString(simple_coords)
>>> outline.length
12.565109003731115
>>> simple_outline.length
12.245869835682875

The simplify_points function requires a sequence of coordinate tuples. The coords property of a Shapely geometry is an iterator over coordinate tuples, but you can make a sequence from the iterator by using Python's built in list function.

>>> from numpy import asarray
>>> from matplotlib import pylab
>>> a = asarray(outline)
>>> fig = pylab.figure(1, figsize=(5,5), dpi=72)
>>> pylab.show()
>>> pylab.plot(a[:,0], a[:,1])
[<matplotlib.lines.Line2D instance at 0x8ae7d8c>]
>>> b = asarray(simple_outline)
>>> pylab.plot(b[:,0], b[:,1])
[<matplotlib.lines.Line2D instance at 0x90bc86c>]

The original:

http://sgillies.net/images/outline.png

And simplified:

http://sgillies.net/images/simple-outline.png

The Programming Historian

The Programming Historian looks great to me. It covers HTML parsing (with Beautiful Soup), regular expressions, Unicode, and link traversing, with more to come.

Via the author, Bill Turkel.

Comments

Re: The Programming Historian

Author: William J. Turkel

Sean, thanks for the plug. Feedback and requests are welcome. We hope to include some geo- stuff at some point in the future, probably starting with a simple bibliographic map mashup. But who knows? We may get to automatically extracting toponyms from texts. Bill