More Fiona

Last night, I saw that Tom MacWright had posted a script for combining Shapefile fields. I thought I'd try to adapt it to Fiona 0.9 and a slightly more functional (FP, I mean) style, and I'm glad I did because I in the process I discovered that my slightly old version of GDAL and OGR isn't detecting the Windows-1252 encoding of the Natural Earth dataset. Version 0.9.1 of Fiona is now tagged and uploaded to PyPI and it lets a user specify the proper encoding of files if needed.

My version of Tom's script tries to be smart about the width of the new text field and warns if the new values will be truncated. The combining is also done within a function, which allows all output to be written in one statement. Our versions of the script run equally fast.

import fiona
import itertools
import logging
import sys

logging.basicConfig(stream=sys.stderr, level=logging.WARN)

def text_width(val):
    return int((val.split(":")[1:] or ["80"])[0])

with fiona.open(
        '/Users/seang/Downloads/ne_50m_admin_0_countries/'
        'ne_50m_admin_0_countries.shp',
        'r',
        encoding='Windows-1252') as inp:

    output_schema = inp.schema.copy()

    # Declare that the output shapefile will have a new text field called
    # a3_sov, with an appropriately sized field width. 254 is the maximum
    # for this format.
    width = min(254,
        text_width(inp.schema['properties']['sov_a3']) +
        text_width(inp.schema['properties']['sovereignt']))
    output_schema['properties']['a3_sov'] = 'str:%d' % width

    # Define a function that combines properties to produce a value
    # for the 'a3_sov' property. It warns if the value will be truncated
    # at 254 characters.
    def combine_fields(rec):
        val = rec['properties']['sov_a3'] + rec['properties']['sovereignt']
        if len(val) > 254:
            log.warn("Value %s will be truncated to 254 chars", val)
        rec['properties']['a3_sov'] = val
        return rec

    with fiona.open(
            'ne_50m_admin_0_countries_a3_sov.shp', 'w',
            crs=input.crs,
            driver=input.driver,
            schema=output_schema) as out:

        out.writerecords(
            itertools.imap(combine_fields, inp))