Get with it

I don't know a nicer way to say this: if your Python software requires you to routinely delete objects, it is letting you down. I'm looking at you, ArcPy cursors:

import arcpy

cur = arcpy.SearchCursor("roads", '"TYPE" <> 4'))

for row in cur:
    print "Name: %s,  CFCC code: %s" % \
          (row.NAME, row.CFCC)

del cur, row

The ArcPy cursor locks resources and this lock is removed (evidently, or update 2011-02-02: maybe not so evidently, see the comments) in the cursor object's __del__ method. This is a less than ideal design because, as stated in the Python docs, del ob does not immediately cause ob.__del__() to be called. To their credit, the ArcPy docs acknowledge this issue:

When working in a Python editor, such as PythonWin, you may need to clean up object references to remove dataset locks set by cursors. Use the gc (garbage collection) module to control when unused objects are removed and/or explicitly delete references within your script.

You're quite seriously failing your users if you make them turn to manual intervention in the garbage collection cycle, but the problem of finalizing the cursor state remains. What would be better? There's two obvious ways to go: emulate Python files (or the Python DB API), or use PEP 343's with statement.

In the first approach, one would add close methods to cursors that immediately and explicitly free up all locked resources. The end of the script above becomes:

for row in cur:
    print "Name: %s,  CFCC code: %s" % \
          (row.NAME, row.CFCC)

# This is too optimistic - finalizes only eventually, if ever
# del cur, row

# Better - finalizes immediately, let garbage collection delete it
# normally
cur.close()

The with statement is more powerful, but more complicated. The problem PEP 343's with statement try to solve is explained by Fredrik Lundh [effbot.org]:

As most other things in Python, the with statement is actually very simple, once you understand the problem it’s trying to solve. Consider this piece of code:

set things up
try:
    do something
finally:
    tear things down

Here, “set things up” could be opening a file, or acquiring some sort of external resource, and “tear things down” would then be closing the file, or releasing or removing the resource. The try-finally construct guarantees that the “tear things down” part is always executed, even if the code that does the work doesn’t finish.

It's a generalization of the problem an ArcPy cursor user faces: locking a data source during a session and returning the data source to its proper state at the end of the session. With context management, ArcPy could let a user write safe, foolproof code like this:

with arcpy.SearchCursor("roads", '"TYPE" <> 4')) as cur:
    for row in cur:
        print "Name: %s,  CFCC code: %s" % \
              (row.NAME, row.CFCC)

# That's all - cursor finalizes when the 'with' block ends

Even without changes in ArcPy, a user can start dodging cursor reference trouble right now by writing a few lines of adapter code.

class make_smarter:
    def __init__(self, cursor)
        self.cursor = cursor
    def __enter__(self):
        return self.cursor
    def __exit__(self, type, value, traceback):
        # self.cursor.close() would be better
        self.cursor.__del__()

with make_smarter(arcpy.SearchCursor("roads", '"TYPE" <> 4'))) as cur:
    for row in cur:
        print "Name: %s,  CFCC code: %s" % \
              (row.NAME, row.CFCC)

The make_smarter class is a "context guard" for the cur object. Its __enter__ method is called when the while block begins and its __exit__ method is called when the block ends, for whatever reason. Python's file object, since version 2.5, implements this very same protocol.

A context guard for writing data with ogr could be equally useful and hide the the unfortunately named method you call to flush data to disk.

class make_smartogr:
    def __init__(self, layer)
        self.layer = layer
    def __enter__(self):
        return self.layer
    def __exit__(self, type, value, traceback):
        self.layer.Destroy()

Take it from me, because I've felt the pain during development of Shapely: relying on the __del__ method of an object like ArcPy does – or does not (see comments), maybe it counts references, which I equally recommend against – will burn you sooner or later. The with statement adds dependability and safety to even the most straight forward data processing script.

Comments

Re: Get with it

Author: Kay

I noticed recently Spatial Analyst's raster-objects have similar problems. This is even worse in a way, as they are temporary files, that don't want to get deleted or overwritten before you set their object to None first.

Doing something like this just doesn't work:

>>> raster += 5

So I have to do this:

>>> raster2 = raster + 5

>>> raster = None

>>> raster = raster2

>>> raster2 = None

I could just use numpy of course, then the first statement would work just fine.

Re: Get with it

Author: FMark

Hi Sean,

I like the idea, but have you actually tried this? It seems the

Cursor

object doesn't have a

__del__

method:

>>> c = arcpy.SearchCursor(a)
>>> c

>>> dir(c)
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__getattribute__', '__hash__', '__init__',
'__iter__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
'__subclasshook__', '__weakref__', '_arc_object', '_go', 'deleteRow', 'insertRow', 'newRow', 'next', 'reset', 'updateRow']
>>>

Re: Get with it

Author: Sean

No, haven't tried it. Maybe there's an lock owning attribute of the cursor with a __del__ of its own, or maybe not. I can imagine alternatives, but will stop guessing. Regardless, reference counting shouldn't be so visible in an API. The more explicit management of resources that I'm advocating remains the better way to go.

Re: Get with it

Author: FMark

I couldn't agree more. Especially since ESRI apparently went to so much trouble to make their API more pythonic.

Re: Get with it

Author: Jason Scheirer

This has already been implemented some for ArcGIS 10.1 with the new Data Access module cursors. Schema locking and object lifetime at the C++ level has gotten significantly more fine-grained across the board in the Python extensions.

Re: Get with it

Author: Sean

Fine-grained sounds good, but coarse or fine, a user shouldn't have to count references. At least not in Python ;)

Re: Get with it

Author: Jason Scheirer

Right, the new cursors both utilize the with statement to open/close connections and are smart enough, for example, to close the connection once a read cursor is exhausted (which is what I meant by finer-grained; the Python object no longer just a dumb holder of a reference to a C++ object that relies on __del__ to release). So no more dels and you can safely use with statements to handle cursors' schema locks.