2012 (old posts, page 2)

Ongoing blog series

I've been writing some posts under the rubric of "geoprocessing for humans". These are generally about keeping software simple, predictable, symmetrical, safe, readable, and well-documented. And simple. Most of all: simple.

A couple of my recent posts are a little different, being about functional programming, partial functions, processing entire files of records without using any for loops, etc – stuff that a GIS programmer/analyst might not recognize as scripting or as even practical for all I know. I think I'll categorize these as "geoprocessing for hipsters". I don't usually begin coding in this way, but often arrive here after a few iterations.

If you find these rubrics fun and/or pedagogical, jump on in.

Geoprocessing for humans: close() and with

Fiona 0.8 has a bunch of important new features: file bounds/extents, sync of bounds and record count with writes, protection against iteration restarts, and more:

0.8 (2012-02-21)
----------------
- Replaced .opened attribute with .closed (product of collection() is always
  opened). Also a __del__() which will close a Collection,  but still not to be
  depended upon.
- Added writerecords method.
- Added a record buffer and better counting of records in a collection.
- Manage one iterator per collection/session.
- Added a read-only bounds property.

I've also expanded the manual (still under construction), documenting among other things what Fiona does to keep users from stumbling over external resources: http://toblerity.github.com/fiona/manual.html#closing-files. Don't del your collections or set them to None and then sacrifice a chicken while crossing your fingers and hoping for the best – just call close() or use with and enjoy some peace of mind.

Update (2012-02-22): "layers" (as in OGR) in the last sentence changed to "collections" (as in Fiona).

Connecting many places

Things you find in Pleiades that you don't find in a typical geographic information system include relationships between places that are expressed in the data itself. I blogged about this last summer (accompanying figure reprised below) and talked about it at the 2011 Digital Humanities conference (our poster here).

http://farm6.static.flickr.com/5312/5885119283_fe08cf3758_z_d.jpg

Until today, connections between places have been a little sparse. Loading 200+ milecastles and turrets with connection to Hadrian's Wall changes the situation at least in Britannia. The representation of Hadrian's Wall in Pleiades doesn't have a published spatial extent of its own, but gets one by virtue of its connections to these other small places. Here is a screenshot from Google Maps.

http://farm8.staticflickr.com/7169/6838202849_73308ba375_b_d.jpg

Here's a closeup near Brampton. The lone placemark at the bottom represents an old quarry that supplied nearby milecastles with rock. For the moment at least, we're asserting that the quarry was connected to the fortifications.

http://farm8.staticflickr.com/7172/6838202859_b19995f36a_b_d.jpg

The remains of Hadrian's Wall are a popular hiking itinerary today. The connected places in the maps above don't quite describe the itinerary because they aren't chained to each other, but I can't stop thinking that we should be making it possible to represent ancient itineraries like the Antonine using places from Pleiades.

Geoprocessing for humans: pygp

I'm not the only one simplifying terrible Python APIs in the geospatial world. Yesterday, I ran across a blog post about software named pygp. Very much about ArcGIS records and fields, it models data differently than Fiona does but similarly eliminates a lot of boilerplate and provides simple access to all coordinates of a record's shape field.

def example_geometry(path):
    """
    Example showing use of Geometry helper class that does the heavy lifting
    on the geometry object and returns something quite similar to WKT/GeoJSON

    Structure is simple, a tuple of tuple of Point objects, very similar to
    Avenue days of geometry and WKT MultiLineString etc.
    (((0, 498266, 6100519, None, None), (0, 499775, 6100281, None, None),
      (0, 500224, 6098694, None, None), (0, 499616, 6097662, None, None),
      (0, 498346, 6096789, None, None)))

    :param path: Workspace Path
    :type path: str
    """
    feature_class = FeatureClass(osjoin(path, POLYGON))
    for srow in feature_class.search():
        print srow.get_value(
            feature_class.shape_field_name).as_tuple()

# End example_geometry function

I don't know whether pygp has eliminated the need to count references to cursors and records or just omitted

del feature_class

from the example. I'd have looked in the code, but I couldn't find a link. I bet a lot of people would love to see it on GitHub.

Comments

Re: Geoprocessing for humans: pygp

Author: Pedant

No need to del feature_class - it's a local that will be garbage collected when example_geometry() returns.

Re: Geoprocessing for humans: pygp

Author: Sean

Correct in this case. But what would happen if I created a new FeatureClass instance after the above code in the same function, using the same parameters? Would I encounter locked files? My question isn't hypothetical: http://gis.stackexchange.com/questions/19408/arcgis-10-0-python-searchcursor-file-locking.

Re: Geoprocessing for humans: pygp

Author: Jason Humber

Backing up a little, in the 9.x world of arcgisscripting there was always an imposed need to delete the row and cursor as a means of closing them and dropping references. This usually looked like del srow, srows statement and was usually placed after the while loop. Being forced to use a while loop and the need to use del always felt odd to us so in the cursor implementation we have in pygp we take care of closing cursors once the loop has exhausted.

So in this case there is no need to del feature_class since the locks on the feature class are dropped when the clean-up/closing is done at the end of the loop.

Re: Geoprocessing for humans: pygp

Author: Jason Humber

Just noticed the last line of your post, drop me an email...

Geoprocessing for humans: date and time

Fiona 0.7 roughly supports OGR date/time fields. Date, time, and datetime field values are turned into strings conforming to RFC 3339 "Date and Time on the Internet: Timestamps". Fiona is ignoring time zones in this version, but then OGR itself doesn't have much support for time zones, and neither do common vector data formats.

There's an example of adding a date type field to a shapefile in test_collection.py.

with collection("docs/data/test_uk.shp", "r") as source:
    schema = source.schema.copy()
    schema['geometry'] = 'Point'
    schema['properties']['date'] = 'date'
    with collection(
            "test_write_date.shp", "w", "ESRI Shapefile", schema
            ) as sink:
        for f in source.filter(bbox=(-5.0, 55.0, 0.0, 60.0)):
            f['geometry'] = {
                'type': 'Point',
                'coordinates': f['geometry']['coordinates'][0][0] }
            f['properties']['date'] = "2012-01-29"
            sink.write(f)

A look at the shapefile's feature table in QGIS shows that I'm getting writing of dates right.

http://farm8.staticflickr.com/7141/6787537911_1312a73981_b_d.jpg

Reading that shapefile back in Fiona confirms that dates are read properly.

>>> from fiona import collection
>>> c = collection("test_write_date.shp", "r")
>>> from pprint import pprint
>>> pprint(c.schema)
{'geometry': 'Point',
 'properties': {'AREA': 'float',
                'CAT': 'float',
                'CNTRY_NAME': 'str',
                'FIPS_CNTRY': 'str',
                'POP_CNTRY': 'float',
                'date': 'date'}}
>>> for f in c:
...     print f['properties']['date']
...
2012-01-29
2012-01-29
2012-01-29
2012-01-29
2012-01-29
2012-01-29
2012-01-29

Be careful with this feature. Unless your data is destined for a legacy system, I think you're better off keeping track of time as RFC 3339 strings in a text field. Among other advantages, you'd gain millisecond precision and precise expression of UTC time offset.

Comments

Re: Geoprocessing for humans: date and time

Author: Michael Weisman

Good to see more OGR field types supported.

Any reason for using a string representation of dates within Fiona rather than using Python's native datetime objects and converting them to the string representation OGR is expecting at write time?

Re: Geoprocessing for humans: date and time

Author: Sean

One of my goals for Fiona is to go light on classes, stick to built in Python types, and make sure that features can be trivially serialized to JSON. Datetime objects fail that last test. Try json.dumps(datetime.time(...)) and you'll get a TypeError. I did consider (year, month, day, hour, minute, second, millisecond) tuples, but RFC 3339 strings suit my needs better. They're ready to show to humans, sort them lexically and you get cheap temporal sorting, and they're easy to parse. Fiona has fiona.rfc3339.parse_date and friends which take strings and return tuples you could pass to the datetime constructors.

PyCon

I'm going to be in Santa Clara, CA, Thursday evening through Sunday morning to attend PyCon US 2013. I'm not presenting, but signed up to be the runner for a scientific Python session on Saturday afternoon. Although I'm working in the humanities, (just between you and me) I'm still a science and engineering type at heart and I haven't found much of a humanities computing presence at PyCon. I hope to see you there.

The conference is moving to Montréal next year. Summer, I hope!

Comments

Re: PyCon

Author: Martin Davis

If I were you I would wish for Montreal in September. Summer is pretty hot and sticky! Fall is beatiful, though - especially if you happen to hit the 2-week window where the leaves change colour. (I know you get nice fall colours in the Front Range, but nothing beats Northeastern North America in the fall. Speaking as someone from the Wet Coast, where the dominant fall colour is misty green...)

Re: PyCon

Author: Sean

Good point. I'll happily roll with whatever the locals pick, though I'm more likely to be able to stick around for the sprints if it's a family vacation.

Geoprocessing for humans: a pip requirements file

In the geospatial software I'm writing and using these days, concerns are well separated. Fiona reads and writes features. Only. Shapely provides computational geometry algorithms. Only. Pyproj (not my work, but a favorite package) transforms coordinates between spatial reference systems. Only. The separation of concerns helps keep interactions between them predictable and as a user you pay only for what you eat.

A programmer-analyst's daily work has all the above concerns (and more, probably). A pip requirements file makes installing all three packages as easy as installing a single package like osgeo.ogr. I've uploaded one to GitHub: https://gist.github.com/1689767. This Gist includes an example of using Fiona, pyproj and Shapely together. Fetching them all, assuming you've got pip and the GDAL/OGR libs and headers already on your system, is just:

$ pip install -r https://raw.github.com/gist/1689767/mersh.txt

Comments

Re: Geoprocessing for humans: a pip requirements file

Author: Paolo Corti

Hey Sean, really nice post, and just a couple of paragraphs long: well done.

I could read this easily even just after lunch :)

Re: Geoprocessing for humans: a pip requirements file

Author: Sean

That's my mantra: Omit needless words and code.