I'd like to see GIS students taught to program in Python using Python idioms, not Avenue idioms. I hate to pick on Utah State's GIS Programming with Python
just because of its popularity, but it contains some good introductory code
that can be easily tuned up to teach even better Python GIS programming skills. For example, let's look at Lesson 5: Writing/Reading Text Files, Python Lists & Dictionaries, part 5a:
# Author: John Lowry
# Date: Dec. 21, 2007
# Purpose: Lesson 5a: Use split and Write to a text file
#############################################################
#Import modules
import string
# Open a new text file to write to
outFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5_results\write_example.txt", "w")
# Make a string variable of featureclass names
fcString = "nests1990.shp,nests1995.shp,nests2000.shp"
# Make a list variable from the string variable using the split method, then print
fcList = fcString.split(",")
# Write each item in the list to a separate line the output file
outFile.write(fcList[0]+ "\n")
outFile.write(fcList[1]+ "\n")
outFile.write(fcList[2]+ "\n")
# Close the files
outFile.close()
There are 3 defects in that code:
- It imports the string module but never uses anything from it. You should almost always use string object methods anyway.
- It presumes knowledge of the number of items in the comma-separated string, specifically that it is at least 3.
- It needlessly concatenates newlines to items before writing.
The equivalent code, with none of those defects, looks like this:
outFile = open('/tmp/out.txt', 'w')
for item in fcString.split(','):
outFile.write(item)
outFile.write('\n')
outFile.close()
In Python 2.6 you can use a with statement to make it even more compact. The file closes itself at the end of the block. And you can use file.writelines to write the data and newline in one go. It's more efficient to pass it a tuple than to pass it a list.
with open('/tmp/out.txt', 'w') as f:
for item in fcString.split(','):
f.writelines((item, '\n'))
In the next part of the lesson, 5b, we see:
# Author: John Lowry
# Date: Dec. 21, 2007
# Purpose: Lesson 5: Reading a and writing a textfile
#############################################################
# Open the text file in read mode
inFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5\nests2005_coords.csv", "r")
# Open a new text file to write to
outFile = open(r"C:\john\TeachingGIS\WILD6900_ArcGISPython\Lesson5_results\nests2005_format.txt", "w")
# Read entire file and print one line at a time
for line in inFile.readlines():
nestList = line.split(",")
id = nestList[0]
cnd = nestList[1]
x = nestList[2]
y = nestList[3]
outFile.write("Siteid: " + id + "\n")
outFile.write("Condition: " + cnd + "\n")
outFile.write("X Coordinate: " + x + "\n")
outFile.write("Y Coordinate: " + y + "\n")
outFile.write("\n")
# Close the files
inFile.close()
outFile.close()
The more effective version of the looping block is this:
for line in inFile:
outFile.write(
'Siteid: %s\nCondition: %s\nX Coordinate: %s\nY Coordinate: %s\n\n' \
% tuple(line.split(','))
)
String formatting is more efficient than string concatenation (or not -- see the update below) and you can avoid needless variable assignments by using the split results directly.
I blogged before about how smelly the ArcGIS scripting cursor syntax was. I hear it's better now, but you can still see the old style in the USU course code.
Update (2009-10-21): Here's my benchmark script. I'm isolating just the inner part of the loop and focusing just on the extra assignments and file writes.
import timeit
# Sample input line
line = '1,good,433207.8362,4518107.044'
# A file-like object
class MockFile(object):
def write(self, line):
pass
outFile = MockFile()
# GIS Style programming. Assignment to intermediate variables
# and each written separately.
s1 = """\
nestList = line.split(',')
id = nestList[0]
cnd = nestList[1]
x = nestList[2]
y = nestList[3]
outFile.write(id)
outFile.write(cnd)
outFile.write(x)
outFile.write(y)
"""
t1 = timeit.Timer(
stmt=s1,
setup='from __main__ import line, outFile'
)
print "GIS style"
print "%.2f usec/pass" % t1.timeit()
print
# Idiomatic Python. No intermediate variables and all written as
# a group
s2 = """\
outFile.write(''.join(line.split(',')))
"""
t2 = timeit.Timer(
stmt=s2,
setup='from __main__ import line, outFile'
)
print "Idiomatic Python"
print "%.2f usec/pass" % t2.timeit()
print
The results:
$ python benchmarks.py
GIS style
2.07 usec/pass
Idiomatic Python
1.29 usec/pass
Someone else has looked at string performance more closely than I, and it looks like I'm wrong. On my Python 2.6, too, concatenation wins over formatting. Use of join is faster than concatenation for lists longer than 1000 items or so.