Fiona and shapefile encoding
It's a common problem in GIS to have a shapefile that was encoded using a character set other than the standard iso-8859-1 but lacks any record of what that character set was. Some shapefiles lie about their encodings. You could make an educated guess at the character set. Your decoding may fail. Worse, it may not fail, but contain garbage when printed out. The Japanese term for this is Mojibake. Ned Batchelder introduced me to the term in his Pragmatic Unicode presentation. It's easy to demonstrate in Python.
Fiona's open()
function has an encoding
keyword argument that is
intended to let developers override the missing or erronenous information for
a shapefile. There's been a regression in Fiona recently and users began to
report unexpected mojibake symptoms. They were using the encoding argument
property but seeing garbage displayed. This regression has been fixed in Fiona
1.7.11. Upgrade as soon as you can.
For a time I was in disbelief that users were reporting a real problem. I chalked this up to dirty data, compilation of GDAL, DLL Hell, anything but a regression. In the end, I think what got me unstuck was facing this note I recently added to Fiona's issue template:
You think you've found something? We believe you.
Something I added to make the project more friendly to first-time contributors ended up being a note to myself.