Python logging

Update (2009-09-29): see http://plumberjack.blogspot.com/2009/09/python-logging-101.html

A friend at the local Park Service office mentioned to me that he's become a Python user and has been looking at some of my published code examples. I'm inspired to post more of these, and more that are useful to the ESRI ArcGIS scripting user. Python serves the ESRI community as a replacement for Avenue, but is a far more rich language and platform. Today's tip: excellent application logging and how to avoid using print statements in your production code.

I tripped myself using print statements recently, and have seen it in the code of others. While not a mortal sin, it does reduce the code's potential for reuse. A print statement might be tolerable when you're running the code at a shell prompt, but what if you wanted to execute the same code in a service? Better not to fill your service logs (or wherever stdout goes) with unnecessary debugging noise, yes? Here's where Python's logging module steps in to help: your information-processing classes write messages to a logging interface, any Python application then configures the logger appropriately. Log messages can be sent to single files on disk, rotating log files, sent to SMTP servers -- you could probably even create your own XMPP handler.

Say, for example, you have a class that patches elements into XML files (inspired by Dave Bouwman's post). Using a Python logger instead of a print statement requires only 2 extra lines of code (zero if you use the root logger):

import getopt
import glob
import logging
import sys
from lxml import etree

# Create a logger
log = logging.getLogger('file-patcher')

# Example of reusable code
class FilePatcher(object):
    """
    Patch edit permissions into an XML configuration file and write new
    files to disk.
    """
    def __call__(self):
        for infilename in glob.glob('*.xml'):
            tree = etree.parse(infilename)
            log.info('Opening file %s.', infilename)
            root = tree.getroot()
            for perms in root.xpath(".//property[@name='permissions']"):
                if not perms.xpath("element[@value='Edit']"):
                    e = etree.SubElement(perms, 'element')
                    e.attrib['value'] = 'Edit'
                    log.info('Patched perms: %s.', str(perms))
            outfilename = 'patched-%s' % infilename
            fout = file(outfilename, 'w')
            fout.write(etree.tostring(root, pretty_print=True))
            log.info('Wrote file %s.', outfilename)

Now, we can make a script that uses this class in a verbose (all logged info) mode or a quieter (logged errors only) mode:

# Script
if __name__ == '__main__':

    verbose = False

    # Get command line options and args
    opts, args = getopt.getopt(sys.argv[1:], 'v')
    for o, a in opts:
        if o == '-v':
            verbose = True

    # Get the file patching logger
    log = logging.getLogger('file-patcher')

    # Logging configuration, to stdout in this case
    console = logging.StreamHandler()
    log.addHandler(console)

    if verbose:
        log.setLevel(logging.INFO)
    else:
        log.setLevel(logging.ERROR)

    # Process data
    patch_files = FilePatcher()
    patch_files()

Duke University's GeoEco project has more examples of professional logging, many in an ArcGIS scripting context.

Comments

Re: Python logging

Author: timmie

HI! Thanks for the post. I recently discovered that this module exists but not yet managed to use it. Particularly because I still use man "prints" for fast debugging on command line. I really do not know how to get around this. I am not a full time software developer and professional debugging/testing seems to be a lot of overhead at my current stage of knowledge. > I'm inspired to post more of these, and more that are useful to the ESRI ArcGIS scripting user. I would be enlighted to see more of these posts. As outlines above, gentle introductions to good coding practices can help a lot: * Fast, pragmatic debugging * Using tests in development of scientific / geospatial scripts. Kind regards.

Re: Python logging

Author: oblivion

I like to do this:
    import logging
    log = logging.getLogger(__name__)
Then, log messages are always identified by their sub-packages of origin, and they can even be filtered on that basis. Also, if you do this:
    log.info('Opening file %s in mode %s.', infilename, mode)
then the runtime penalty is tiny, since the string substitution does not occur if the log record is filtered, but I still prefer to do this:
    log.info('Opening file %s in mode %s.' %(infilename, mode))
The substitution definitely occurs only once, no matter how many handlers exist. I simply avoid logging anything in runtime-critical sections of code.

Re: Python logging

Author: Allan Doyle

My contribution (works on Python 2.5, haven't done much Python hacking lately):
#! /usr/bin/env python

"""
debugging - inspired by the debugging.h and error_handling.c modules
            of OpenMap's C-based "toollib" that I originally wrote
            in the 90's while at BBN.


Debug([word])

        Looks in the current environment (via getenv) to look for
        an environment variable called 'DEBUG'.

        If there is none, then it returns False
        If there is one, and it was called with an argument, then
        if the DEBUG variable contains that argument, it returns True
        otherwise, it returns false. Note that it does this with
        'word' boundaries.

        Examples:

        Assume no DEBUG environment variable is set
                Debug() --> False
                Debug("foo") --> False
                Debug("bar") --> False

        Assume DEBUG is set to "foo foobar"

                Debug() --> True
                Debug("foo") --> True
                Debug("bar") --> False

DebugSet([word])

        Allows you to add strings to the set that are in the DEBUG
        environment variable.

        Examples:

        Assume DEBUG is set to "foo foobar"
                DebugSet("bar")
                Debug("bar") now returns True

DebugUnset([word])

        If called with no arguments, it makes all subsequent calls to Debug
        return False

        If called with a word, it makes all subsequent calls to Debug with
        that word return False

DebugMessage(message, [level])

        message is a string
        level is one of "CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG"

        Prints the message to stderr, along with date, time, level, and
        source file name and line number where the call was made.
"""

# ------------------------------------------------------------------------
#
# Copyright (c) 2006 Allan Doyle
#
#  Permission is hereby granted, free of charge, to any person
#  obtaining a copy of this software and associated documentation
#  files (the "Software"), to deal in the Software without
#  restriction, including without limitation the rights to use, copy,
#  modify, merge, publish, distribute, sublicense, and/or sell copies
#  of the Software, and to permit persons to whom the Software is
#  furnished to do so, subject to the following conditions:
#
#  The above copyright notice and this permission notice shall be
#  included in all copies or substantial portions of the Software.
#
#  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
#  EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
#  MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
#  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
#  HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
#  WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
#  OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
#  DEALINGS IN THE SOFTWARE.
#
# ------------------------------------------------------------------------


import inspect
import logging
import os
import os.path
import sys

## Set up the Debug part of this code...

__vars = {}
__debug = os.getenv("DEBUG")

if __debug:
    __vars[' '] = ' '                   # we use ' ' as a marker for DEBUG
                                        # being set, even if it has no content

    try:
        # each separate string in DEBUG gets used as a key in a dictionary
        # later on, this makes it easy to tell if that string was set
        for v in os.getenv("DEBUG").split():
            __vars[v] = v

    except AttributeError:
        pass
    except TypeError:
        pass

def Debug(name=" "):
    """
    Checks to see if the "DEBUG" environment variable is set, if
    an optional string is passed in, if that string is set in the
    "DEBUG" environment variable.
    """
    global __vars
    return __vars.has_key(name)

def DebugSet(name=" "):
    """
    If called with no argument, causes subsequent calls to Debug() to
    return True. If called with an argument, causes subsequent calls
    to Debug() with the same string to return True
    """

    global __vars
    global __debug

    __debug = True
    __vars[name] = name
    return(True)

def DebugUnset(name=" "):
    """
    If called with no argument, causes subsequent calls to Debug() to
    return False. If called with an argument, causes subsequent calls
    to Debug() with the same string to return False.
    """
    global __vars
    global __debug

    if name == " ":
        __debug = False
        __vars = {}
        return

    try:
        del __vars[name]

    except KeyError:
        pass

    return

## Set up the DebugMessage part of this. We use the logging module,
## but send everything to stderr. Someday, this could get expanded
## to also have a logfile, etc.

logging.basicConfig(level=logging.DEBUG,
                    format='%(levelname)-8s %(message)s',
                    stream=sys.stderr)


levels = {'CRITICAL':logging.critical,
          'ERROR':logging.error,
          'WARNING':logging.warning,
          'INFO':logging.info,
          'DEBUG':logging.debug}


def DebugMessage(msg, level="DEBUG"):
    """
    Produces nicely formatted output to stderr.
    If called with the wrong level, it complains, and behaves
    as though the level was "ERROR"
    If called with no level, it uses "DEBUG"
    The allowed levels are
      "CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG"
    """

    current = inspect.currentframe()
    outer = inspect.getouterframes(current)

    try:
        levels[level]('%s - %4d: %s' %
                      (os.path.basename(outer[1][1]),
                       outer[1][2],
                       msg))

    except KeyError:
        DebugMessage('DebugMessage() called with unknown level: %s' % level)
        logging.error('%s - %4d: %s' %
                      (os.path.basename(outer[1][1]),
                       outer[1][2],
                       msg))
    del outer
    del current

__version__ = '$Id$'
if Debug('version'): print __version__

Then, to use it:

# Grab some of the debugging stuff...
from debugging import Debug, DebugSet, DebugMessage

# Put this in every module. Later on, you can 'setenv DEBUG version'
# and your code will print out the version of every module when it runs
__version__ = '$Id$'
if Debug('version'): print __version__

# Now you can do stuff like

   if Debug("time"): print "a. %.2f %s" % (time.time()-self.start, path)

# or
   if of.GetLayerCount() == 0:
        DebugMessage("No layers in %s" % filename)

# or
   if Debug("filename") or Debug("verbose"): print "fileinfo",  filename




Re: Python logging

Author: Sean

Wow, my blog comments do not render code well at all. Good tips, oblivion and Allan. Testing and debugging are big topics. To get started with debugging Python, you could a lot worse than to add
import pdb; pdb.set_trace()
before a troubled section of code. Run your script and set_trace() gives you a Python prompt at that point. From the prompt you can inspect all objects in scope, step through statements, and go up (and back down) the stack of execution frames. Jeremy Jones's pdb tutorial explains all you need to begin.

Re: Python logging

Author: Kyle VanderBeek

I generally recommend against creating logger objects at load time (which will happen due to where your library's getLogger() call is). I do it in the __init__ of an object, and use both the module name and object name together to former a proper category heirarchy:
def __init__(self):
    self._log = logging.getLogger('%s.%s' % (__name__, self.__class__.__name__))
This idiom gives you a dot-separated hierarchy of logger categories, allowing you to configure filters and handlers to send different categories to different destinations, mute some module's log messages, or turn up and down levels in a more granular fashion. Lastly, you're working too hard configuring the log. Look at logging.basicConfig(). And, actually, you should be configuring the root logger (by calling logging.getLogger() with no arguments), not one named with a particular category, in your main program. This will become important as your project grows to multiple modules.

Re: Python logging

Author: Sean

Great tips, Kyle. Thanks.

Change

Update (2008-11-07): Restoring the country's economy, confidence, dignity, and reputation will be tough; I expect to be disappointed along the way, but right now I am enormously pleased. Fort Collins, Larimer County, Colorado, and the United States of America expressed enough hope over fear, and rightly held the GOP accountable for getting us into this ditch.

http://www.barackobama.com/images/widgets/Obama08_ThumbLogo200.gif

After 7 disastrous years of mal-intent and incompetence, let's give the hopeful and serious a chance. It's time for a change.

Shapely 1.0.8

Paul Winkler and the OpenGeo team found a problem with Shapely and Zope's web server on their 64-bit system, provided the fix, and now here's Shapely 1.0.8. An upgrade is recommended if you're using Zope or a similar framework.

Comments

Re: Shapely 1.0.8

Author: Paul Winkler

Hi Sean, thanks for the kind words and getting an official fix out so quickly. I must give credit where due: All the hard gdb sleuthing was led by my coworkers David Turner and Douglas Mayle. I don't have the necessary C/C++ chops to have solved this on my own. Oh, and to clarify, the problem had nothing to do with Zope (or Grok in our case), this upgrade should be good for anyone running on a 64-bit system. I was eventually able to write a standalone script that demonstrated the segfault. The same problem could afflict any python package that uses Ctypes on 64-bit systems without explicitly marking argument and return types.

Multiprocessing with Rtree

I came up with an example of how Python's multiprocessing package -- standard in 2.6, and recently backported to 2.4 and 2.5 (on PyPI) -- could be used to set up a simple R-Tree index server:

from multiprocessing.managers import BaseManager

class RtreeManager(BaseManager):
    pass

RtreeManager.register('add')
RtreeManager.register('intersection')

if __name__ == '__main__':

    from rtree import Rtree

    class NoisyRtree(Rtree):
        def add(self, i, bbox):
            print "Adding: %s, %s" % (i, str(bbox))
            Rtree.add(self, i, bbox)
        def intersection(self, bbox):
            print "Searching: %s" % str(bbox)
            return Rtree.intersection(self, bbox)

    index = NoisyRtree('processing')

    RtreeManager.register('add', index.add)
    RtreeManager.register('intersection', index.intersection)

    manager = RtreeManager(address=('', 50000), authkey='')
    server = manager.get_server()
    print "Server started"
    server.serve_forever()

Run that module (man.py) as a script to start the server, and access the index from Python in a new process like this:

>>> from man import RtreeManager
>>> manager = RtreeManager(address=('', 50000), authkey='')
>>> manager.connect()
>>> print manager.intersection((-20.0, -20.0, 20.0, 20.0))
[1L, 2L, 5L]

Three items were already in my index, persisted on disk. One more can be added like this:

>>> manager.add(4, (-10.0, -10.0, -9.0, -9.0))
<AutoProxy[add] object, typeid 'add' at 0x-483da894>
>>> print manager.intersection((-20.0, -20.0, 20.0, 20.0))
[1L, 2L, 5L, 4L]

The manager synchronizes access so additions and queries from different processes don't clobber each other.

Geo-enabling CouchDB

Interesting. It occurs to me that SQLite, even, is overkill for this purpose. A relational database isn't necessary and the unavoidable duplication of data is undesirable. All that is needed is an R-Tree index and, eventually, the capacity to use spatial operators and predicates in views. The Rtree package that Howard and I have been working on could serve well. It need store only the keys of CouchDB documents, so there's no duplication of spatial data. SpatiaLite (which appeals to me in many ways) uses the same geometry library as Shapely does (GEOS), so a Python query using Rtree and Shapely has essentially the same implementation as an OpenGIS SQL query of SQLite/SpatiaLite. I suggested in comments that GeoCouch might want to take advantage of the GeoJSON group's work, on geometry formatting at least.

Via Simon Willison.

Comments

Re: Geo-enabling CouchDB

Author: Jan Lehnardt

Hi, that's what we are thinking :) CouchDB's external-indexing API is fairly simple. What Volker did is a proof of concept of how to get something up and running as fast as possible. Directly working with RTree would be pretty cool. Are you up for some collaboration? :) Cheers Jan --

Re: Geo-enabling CouchDB

Author: Volker Mische

Hi, I actually tried the R-Tree package first, then I found SpatiaLite. I wanted to keep it as minimal as possible and as you say, I only need an R-Tree. The reason for switiching was file locking. There might be an easy workaround for having one script having right access and another one read only access, mine was using SpatiaLite as it offers equal functionality with not that much overhead (and as Jan aready said, I wanted to get something running asap). Another reason that was that the sptialindex kind of broken when I added to manay point >300000 iirc (though I haven't tested it with SpatiaLite yet). When I can overcome these issues I'd be happy to give R-Tree package another chance and see if I can work it out, as it really does everything I need (especially Shapely). About GeoJSON: sounds like a good idea, though I might use only GeoJSON-like structure to make lat/lon vs. lon/lat more obvious. I don't want to confuse people as I will use lat/lon in the future, and the prefered GeoJSON coordinate order is lon/lat.

Re: Geo-enabling CouchDB

Author: Sean

I can't fault anybody for using a working solution and getting things done, and had no idea you'd tried Rtree already. I'm using it in an application, one that doesn't have to be as concurrent as yours. Do you think it would be helpful to queue Rtree access, perhaps using multiprocessing's Manager class? Having asked the question, I really must try it. About GeoJSON coordinate order: we got it right :) What you get out of following it is easy integration with OpenLayers, Shapely, etc. I also can't deny that if all you have is points, {"lat": 0.0, "lon": 0.0} is most clear ... but only until you encounter different coordinate systems.

Re: Geo-enabling CouchDB

Author: brentp

knowing very little about couchdb, except that it has uses btree, this seems like the good application for geohash.

Re: Geo-enabling CouchDB

Author: Volker Mische

Sean, yes I know that you got the coordinate order right. I might use something like coordinate_order and make the default order lon/lat, as I want to support more than just points. But that's a detail, I'd rather spend some time on trying the R-Tree package again :)

Zotero Update

The key paragraph from George Mason University's response to the Reuters lawsuit, via Dan Cohen:

The Thomson Reuters Corporation has sued the Commonwealth of Virginia over Zotero, a project based at George Mason University’s Center for History and New Media (CHNM). A free and open-source software initiative, Zotero aims to create the world’s best research tool and has already been adopted by hundreds of thousands of users at countless colleges and research universities. CHNM announces that it has re-released the full functionality of Zotero 1.5 Sync Review to its users and the open source community. As part of its formal response to this legal action, Mason will also not renew its site license for EndNote.

The Geospatial-Military-Industrial Complex blogs

The got geoint? blog aims to be "fun" and "hip". I'm not the only one who's skeptical. There's no rule of "Web 2.0" that says one must be fun and hip. Grim, steely resolve is fine too. Be yourselves. Perhaps "fun and hip" is cover, but having announced that the blog will be fun and hip, the cover is blown ... or is it? An overly-enthusiastic blog could be cover, deep cover, for something else. I'm just saying. Will it emulate fun and hip security-related blogs like Danger Room, Threat Level, and Boing Boing? Will there be cats and squid -- giant squid with low-light video cameras and microphones -- on Fridays? Satellite imagery of James Fee clipping his toenails on the back doorstep in his plaid flannel bathrobe? YouTube videos of Steve Coast running from drone aircraft, followed by a too-funny post about submitting his futile panicked route to OSM? It's not like the military-industrial complex can't afford the fun that would take a blog to the next level.

Happy birthday, Ursula K. Le Guin

79 today. Geography was a key ingredient in her Earthsea novels (which I read again earlier this year, and the Other Wind for the first time). Did you know that her website has maps of Earthsea?

http://farm4.static.flickr.com/3234/2961979462_6353fecd06_d.jpg

Is that crayon? I've been looking at a lot of crayon coloring lately, and would swear it is. Cool! Maps of Archaic Latium from her new novel, Lavinia, are also available from her site.

Comments

Re: Happy birthday, Ursula K. Le Guin

Author: matt

I just reread the series this year too. I loved the maps, finding myself thumbing back to them just about every time locations are mentioned in the novels. The geography is extremely important to the storyline.