Better Python Practices for the GeoWeb

It pains me to see novices taught poor Python programming practices, and so I can't resist making a few corrections to this post. Processing and marking up data into KML is a simple task that can be used to teach better practices. Here are 3 easy ones:

  • abstract access to file-like resources with urllib;
  • iterators and generators;
  • XML templating with Genshi;

Here is a script that reads in a FIRMS text file and writes out a KML document named fires.kml:

import urllib2

from genshi.template import TemplateLoader

def collect_latest_fires():
    f = urllib2.urlopen('file:N_America.A2007275.txt')
    for line in f:
        fields = line.split(',')
        lat = fields[0]
        long = fields[1]
        confidence = fields[8]
        coords = '%s, %s' % (long, lat)
        yield {
            'name': 'Wildland Fire at %s' % coords,
            'description': 'Confidence: %s' % confidence,
            'coordinates': coords

loader = TemplateLoader(['.'])
template = loader.load('template.kml')
stream = template.generate(collection=collect_latest_fires())

f = open('fires.kml', 'w')

The input data file is hard coded in the collect_latest_fires function, but it's trivial to calculate the name of the latest data (A well-managed site would probably call it current.txt). You can get the input data via FTP or HTTP by using the proper URI scheme:

>>> f = urllib2.urlopen('')

By virtue of the yield statement, collect_latest_fires is a generator, an iterator that keeps track of its own state and computes values on demand. Iterators are a key element in Python programming. Note that Python file-like objects are themselves iterators. If you needed an actual list of fires, you could create it from the generator like so:

>>> collection = list(collect_latest_fires())

This generator function is the bulk of the script. The remainder simply loads a Genshi template, generates an output markup stream using an iterator over the latest collection of fires, and then renders and writes the stream.

Here is the KML template:

<?xml version="1.0" encoding="utf-8"?>
    <Style id="fireIcon">
    <Placemark py:for="item in collection">
      <name py:content="item['name']">NAME</name>
      <description py:content="item['description']">
        <coordinates py:content="item['coordinates']">

Use well-engineered templating systems and/or serializers to create KML. Do not concatenate hand-coded strings of angle brackets. Templates like the one above can be run through your favorite XML tools to insure that they are well-formed. You can't do that with your Python code. A good templating system also handles the encoding for you.

Finally, since the FIRMS source data is changing only once a day, you should be running the script above no more than once a day. Don't use it as a CGI. Transform the source data to KML and write it to a file under your web server. Configure your web server to provide the application/ content type for the KML and also set the HTTP Expires header to the modification time of the file plus 24 hours. That's a recipe that scales.


Re: Better Python Practices for the GeoWeb

Author: Kristian

Bookmarked. Now I'm just waiting for part two, "TurboGears and Shapely vs. GeoDjango" ;)

Re: Better Python Practices for the GeoWeb

Author: Sean

TurboGears and Django overflow with good practices, so no need for that sequel.