Python idioms for GIS Education

I'd like to see GIS students taught to program in Python using Python idioms, not Avenue idioms. I hate to pick on Utah State's GIS Programming with Python just because of its popularity, but it contains some good introductory code that can be easily tuned up to teach even better Python GIS programming skills. For example, let's look at Lesson 5: Writing/Reading Text Files, Python Lists & Dictionaries, part 5a:

# Author:  John Lowry
# Date: Dec. 21, 2007
# Purpose: Lesson 5a: Use split and Write to a text file
#############################################################

#Import modules
import string

# Open a new text file to write to
outFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5_results\write_example.txt", "w")

# Make a string variable of featureclass names
fcString = "nests1990.shp,nests1995.shp,nests2000.shp"

# Make a list variable from the string variable using the split method, then print
fcList = fcString.split(",")

# Write each item in the list to a separate line the output file
outFile.write(fcList[0]+ "\n")
outFile.write(fcList[1]+ "\n")
outFile.write(fcList[2]+ "\n")

# Close the files
outFile.close()

There are 3 defects in that code:

  • It imports the string module but never uses anything from it. You should almost always use string object methods anyway.
  • It presumes knowledge of the number of items in the comma-separated string, specifically that it is at least 3.
  • It needlessly concatenates newlines to items before writing.

The equivalent code, with none of those defects, looks like this:

outFile = open('/tmp/out.txt', 'w')
for item in fcString.split(','):
    outFile.write(item)
    outFile.write('\n')
outFile.close()

In Python 2.6 you can use a with statement to make it even more compact. The file closes itself at the end of the block. And you can use file.writelines to write the data and newline in one go. It's more efficient to pass it a tuple than to pass it a list.

with open('/tmp/out.txt', 'w') as f:
    for item in fcString.split(','):
        f.writelines((item, '\n'))

In the next part of the lesson, 5b, we see:

# Author:  John Lowry
# Date: Dec. 21, 2007
# Purpose: Lesson 5:  Reading a and writing a textfile
#############################################################

# Open the text file in read mode
inFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5\nests2005_coords.csv", "r")

# Open a new text file to write to
outFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5_results\nests2005_format.txt", "w")

# Read entire file and print one line at a time
for line in inFile.readlines():
    nestList = line.split(",")
    id = nestList[0]
    cnd = nestList[1]
    x = nestList[2]
    y = nestList[3]
    outFile.write("Siteid: " + id + "\n")
    outFile.write("Condition: " + cnd + "\n")
    outFile.write("X Coordinate: " + x + "\n")
    outFile.write("Y Coordinate: " + y + "\n")
    outFile.write("\n")

# Close the files
inFile.close()
outFile.close()

The more effective version of the looping block is this:

for line in inFile:
    outFile.write(
      'Siteid: %s\nCondition: %s\nX Coordinate: %s\nY Coordinate: %s\n\n' \
      % tuple(line.split(','))
      )

String formatting is more efficient than string concatenation (or not -- see the update below) and you can avoid needless variable assignments by using the split results directly.

I blogged before about how smelly the ArcGIS scripting cursor syntax was. I hear it's better now, but you can still see the old style in the USU course code.

Update (2009-10-21): Here's my benchmark script. I'm isolating just the inner part of the loop and focusing just on the extra assignments and file writes.

import timeit

# Sample input line
line = '1,good,433207.8362,4518107.044'

# A file-like object
class MockFile(object):
    def write(self, line):
        pass

outFile = MockFile()

# GIS Style programming. Assignment to intermediate variables
# and each written separately.

s1 = """\
nestList = line.split(',')
id = nestList[0]
cnd = nestList[1]
x = nestList[2]
y = nestList[3]
outFile.write(id)
outFile.write(cnd)
outFile.write(x)
outFile.write(y)
"""

t1 = timeit.Timer(
    stmt=s1,
    setup='from __main__ import line, outFile'
    )
print "GIS style"
print "%.2f usec/pass" % t1.timeit()
print

# Idiomatic Python. No intermediate variables and all written as
# a group

s2 = """\
outFile.write(''.join(line.split(',')))
"""

t2 = timeit.Timer(
    stmt=s2,
    setup='from __main__ import line, outFile'
    )
print "Idiomatic Python"
print "%.2f usec/pass" % t2.timeit()
print

The results:

$ python benchmarks.py
GIS style
2.07 usec/pass

Idiomatic Python
1.29 usec/pass

Someone else has looked at string performance more closely than I, and it looks like I'm wrong. On my Python 2.6, too, concatenation wins over formatting. Use of join is faster than concatenation for lists longer than 1000 items or so.

Comments

Re: Python idioms for GIS Education

Author: Eric Wolf

Sean - you obviously haven't tried teaching Python to Geographers. I find myself explaining over and over again how

with

works to grad students who've been writing Python code regularly for their advisors. And as for breaking out the

write

statements, my friend who regularly TA's the graduate Python class at CU-Boulder and I had a contest to see which of us could write the program for a lab in the fewest number of lines. The end result: combining program functionality into fewer lines of code destroys the readability and maintainability of the code. In the example above, it's pretty clear that the output spans multiple lines because the code spans multiple lines. But your rewrite requires that the student remember that /n means "start a new line".

And as for dropping the ".readlines()", I'd rather see the student specify the action directly rather than assuming it "just happens". For people who work in multiple languages, over use of "magic variables" or "magic functions" only makes the code harder to understand.

So here's the big question: Are your code rewrites significantly more efficient than the original in terms of execution?

Re: Python idioms for GIS Education

Author: Regina

Eric,

Honestly -- you can't say the first example is better than the second. I don't care what language you program in, the first code snippet is fraught with flaws.

Coming from a SQL/VB.NET/PHP/C# mindset, that Sean's example is much more understandable and clearly better.

Now as for the second example, I see your point that someone new to Python may be a bit at a disadvantage understanding this -- just like anyone seeing SQL for th first time would be disadvantaged seeing a join statement. That doesn't mean I should stop writing join statements since that is the strength of the language.

To some extent I would argue though that a fundamental reason aside from libraries available for it, are the succintness of its idioms. If you are going to write Python code, write Python code like a Python programmer. Anything else would just be insulting.

Re: Python idioms for GIS Education

Author: Sean

If you write slow code, you get a slow program. If you make a lot of needless assignments, you get a big, slow, program. It's not just about succinctness of the source. The Python idioms are usually backed up by optimized code and execute faster than pseudocode. How much faster? I'll benchmark it.

My first two examples are much more readable and maintainable than the original. That's a long string template in the last, granted, but there are constructions that can make it more readable. Python's triple-quoted text, for example:

template = '''\
Siteid: %s
Condition: %s
X Coordinate: %s
Y Coordinate: %s
'''

for line in inFile:
    outFile.write(template % tuple(line.split(',')))

I've taught Python to geographers a few times. I even remember being thanked and told that the lesson helped. Most of them were open source nuts and hackers at heart, it's true.

Re: Python idioms for GIS Education

Author: Clemens

Does your second suggestion really work? I'm still on Python 2.5 and I need to convert the result of the split to tuple in order to avoid "TypeError: not enough arguments for format string".

Re: Python idioms for GIS Education

Author: Sean

Indeed. Thanks, Clemens. I've corrected it.

Re: Python idioms for GIS Education

Author: Sean

I've previously encountered the opinion that non-idiomatic code is better for GIS, but I think it's misguided. Write Ruby like it is intended to be written. Write Python like it is intended to be written. One such reference for Python I've found is David Goodger's Code Like a Pythonista: Idiomatic Python. I'd love to see this near the top of Python GIS programming course reading lists.

It's not as if, to use a software-carpentry analogy, USU students are being taught to drive nails with the claw end of the hammer. The course is pretty good. But it's not quite teaching students how to drive nails with the least number of blows.

Re: Python idioms for GIS Education

Author: Chris

Valid points, Sean.

But I thought I'd point out that most of the students that take this course know absolutely nothing about programming. It's hard enough getting basic concepts across -- I wouldn't want to try to explain the string formatting to them when some of them are lucky to even get stuff to print out right. Also, John (the guy who wrote this code) is not a programmer. But he likes tinkering with code occasionally and thought it would be fun to teach a class. Which was fine by me, because it meant I got to teach the open source part of the course. And I'm sure there are problems with my code, too...

Re: Python idioms for GIS Education

Author: Eric Wolf

As Chris said, the students encountered in geography programs (including grad students) commonly know absolutely nothing about programming. Extending this comment:

1. These students made it to college (even grad school), they aren't stupid... but

2. They arrived at that level without knowing how to program.

So, why is it that intelligent people can reach such a high level of education without learning to program? Is it because they weren't exposed to programming in elementary school (maybe)? Or is it that these people aren't drawn programming (likely)?

In the case of people not having exposure, some of these people pick up programming quite quickly. They seem to be successes and are the type of student who appreciate learning how to optimize code (for speed, clarity, etc). But the other case of people, they don't need to learn to optimize code. They have an approach to problem solving that works - but doesn't necessarily fit with programming. They likely need to write a few scripts to automate geoprocessing for their problems. This second class of students are well-understood and targeted by ESRI. Hence Avenue!

On the positive side, these students aren't going to develop large programs in Python. So a little code bloat isn't going to bother them. They are also comparing their scripts to how well the same methods run in the ArcToolbox. The speed bar is set pretty low.

I really appreciate your examples. I would include them in a text book for Geography students - but I would present the clunky Avenue-style code first. For the students who do want to make their code more efficient, they can look at the refined code. But I sure as heck wouldn't start off trying to teach geography students how to right optimal Python code.