Last week I was NYU's Institute for the Study of the Ancient World to plan our Concordia project, an effort to interlink projects like Pleiades, IRCyr, the ANS database, and APIS, and build a traversable, searchable network of data based on Web architecture. I'll be blogging more about this all through the year -- expect my blog to intersect more with the Mapufacture and FortiusOne blogs if they continue on course. Some may even be interesting to mainstream GIS designers and developers. ESRI's implicit embrace of Web architecture (along with the explicit embrace of Google) was one of the big stories out of Where 2.0, after all.
One thing that cropped up in our sessions was the need of other Ancient World projects to be able to refer to Barrington Atlas features in Pleiades by URIs derived from their atlas labels. Tom came up with a template for these URIs:
http://pleiades.stoa.org/batlas/{label-normalized}-{map}-{grid}
and simple rules for normalization that just so happen to be already implemented in plone.i18n. We've forked (in a friendly way: forking is now cool thanks to git, right?) plone.i18n and removed the Zope utilities and all dependency on the Zope component architecture. The result is pleiades.normalizer, and it reduces Barrington Atlas labels which may contain annotation and non-ASCII characters to ASCII strings suitable for use in the URI template:
>>> from pleiades.normalizer import normalizer
>>> list(normalizer.normalizeN(u'Tetrapyrgia'))
['tetrapyrgia']
>>> list(normalizer.normalizeN(u'Timeles fl. '))
['timeles-fl']
>>> list(normalizer.normalizeN(u'*Tyinda'))
['tyinda']
>>> list(normalizer.normalizeN(u'[Agrai]'))
['agrai']
>>> list(normalizer.normalizeN(u'Kalaba(n)tia'))
['kalabantia']
>>> list(normalizer.normalizeN(u'Tripolis ad Maeandrum/Apollonia ad Maeandrum/Antoniopolis'))
['tripolis-ad-maeandrum', 'apollonia-ad-maeandrum', 'antoniopolis']
>>> list(normalizer.normalizeN(unicode('Ağva', 'utf-8')))
['agva']
>>> list(normalizer.normalizeN(unicode('Çaykenarı', 'utf-8')))
['caykenari']
The algorithm normalizes non-ASCII characters (normal form KD) and discards elements which are not letters or digits:
U+011F -> (g, U+0306) -> g
U+0131, the last character in Çaykenarı, is a bit of a troublemaker. Our ASCII "i" has the diacritical mark relative to its dotless latin cousin, the inverse of the usual situation, and we have to make a special exception for it in the code.
If you're getting started in web development with Python you might find pleiades.normalizer handy, or at least a starting point for your own normalization code. We won't be publishing it to PyPI, but you can get an egg via:
$ easy_install http://atlantides.org/eggcarton/pleiades.normalizer-0.1.tar.gz
Comments
Re: Tilt!
Author: jaron
hey Sean- thanks for bringing this up- and thanks to Tim@UM for letting us know .. We're going to get it right and re-release, we'll post again when it's correct. We should have looked at the draft spec more carefully... apologies, we'll get it right...Re: Tilt!
Author: Sean
Excellent! Thanks, Jaron.Re: Tilt!
Author: Matt Priour
I think there is still confusion over the proper coordinate order when you actually specify a crs of EPSG:4326 (ie WGS84, Decimal Degrees with Lat Lon order). I got this snippet from the latest trunk version of Feature Server when I pointed it to a shapefile with a specified (not implied) CRS of EPSG:4326. You will notice the Y, X coordinate order which seems like it is against spec, but since I specified a CRS which uses that Y,X coordinate order, it could be right. I was personally +1 for the optional Coordinate Order element proposal for the GeoJSON spec, but I was clearly in the minority.Re: Tilt!
Author: Christopher Schmidt
Matt: I don't know what version of FeatureServer you're using, but that is definitely wrong. Can you share your shapefile? To me, it looks like it's buggy data; I've never seen OGR give Y,X ordering, which it looks like you're getting, so I'd like to understand how to know when it's happening and fix it.Re: Tilt!
Author: Matt Priour
@Christopher I looked at my files and you are right. It is the shapefile itself that is wrong. It was created using OGR2OGR with a csv text file and a VRT file. The VRT file incorrectly identified the x & y columns: So of course, the GeoJSON coordinates are sent in the wrong order. If I had tried to use a BBOX on this dataset, I would have certainly gotten incorrect results.Re: Tilt!
Author: Christopher Schmidt
Garbage In, Garbage Out. Bad data begets bad data.Re: Tilt!
Author: iwei
We've fixed the GeoJSON formatting by reversing the coordinates (applies to our KML version too). Sorry for the inconvenience and thanks for the catch!