Bootstrapping a Python project

Here are my notes on starting a brand new, versioned, readily distributed Python project. Examples show a bash session, but Python, virtualenv, pip, distribute, paster, and hg all work on Windows (from whence more and more Python GIS programmers come) as well.

1. Create a fresh virtual environment. Why? So you don't clutter your system Python with in-development code, and to keep possibly conflicting versions of dependencies out of your development environment. It's probably even more useful for working on code that we may clone from another repository than it is for starting from scratch.

$ virtualenv --distribute foo
New python executable in /tmp/foo/bin/python2.6
Also creating executable in /tmp/foo/bin/python
Installing distribute....done.

2. Install paster if it wasn't already installed under our original Python (or if you used the --no-site-packages option). It's a script that creates a basic, normal source layout for a new package, prompts us for essential project metadata, and writes a working setup.py.

$ cd foo
$ ./bin/pip install PasteScript
Downloading/unpacking PasteScript
  Downloading PasteScript-1.7.3.tar.gz (127Kb): 127Kb downloaded
  Running setup.py egg_info for package PasteScript
Downloading/unpacking Paste>=1.3 (from PasteScript)
  Downloading Paste-1.7.2.tar.gz (373Kb): 373Kb downloaded
  Running setup.py egg_info for package Paste
Downloading/unpacking PasteDeploy (from PasteScript)
  Downloading PasteDeploy-1.3.3.tar.gz
  Running setup.py egg_info for package PasteDeploy
    warning: no files found matching 'docs/*.html'
    warning: no previously-included files found matching 'docs/rebuild'
Installing collected packages: Paste, PasteDeploy, PasteScript
...
Successfully installed Paste PasteDeploy PasteScript

3. Create the new project.

$ ./bin/paster create -t basic_package foogis
Selected and implied templates:
PasteScript#basic_package  A basic setuptools-enabled package

Variables:
  egg:      foogis
  package:  foogis
  project:  foogis
Enter version (Version (like 0.1)) ['']: 0.1
Enter description (One-line description of the package) ['']: FooGIS
Enter long_description (Multi-line description (in reST)) ['']:
Enter keywords (Space-separated keywords/tags) ['']: gis
Enter author (Author name) ['']: Sean Gillies
Enter author_email (Author email) ['']: sean@example.com
Enter url (URL of homepage) ['']: http://example.com/foogis
Enter license_name (License name) ['']: DWTFYWWI
Enter zip_safe (True/False: if the package can be distributed as a .zip file) [False]:
Creating template basic_package
Creating directory ./foogis
  Recursing into +package+
    Creating ./foogis/foogis/
    Copying __init__.py to ./foogis/foogis/__init__.py
  Copying setup.cfg to ./foogis/setup.cfg
  Copying setup.py_tmpl to ./foogis/setup.py
Running /tmp/foo/bin/python2.6 setup.py egg_info

What we get is

$ find foogis
foogis
foogis/foogis
foogis/foogis/__init__.py
foogis/foogis.egg-info
foogis/foogis.egg-info/dependency_links.txt
foogis/foogis.egg-info/entry_points.txt
foogis/foogis.egg-info/not-zip-safe
foogis/foogis.egg-info/PKG-INFO
foogis/foogis.egg-info/SOURCES.txt
foogis/foogis.egg-info/top_level.txt
foogis/setup.cfg
foogis/setup.py

The package code iself is in foogis/foogis. The foogis directory holds distribution files. Metadata, README, etc.

4. This is a good time to get everything under revision control (except the egg-info, as Tarek points out).

$ cd foogis
$ hg init
$ hg add --exclude *egg-info
adding foogis/__init__.py
adding setup.cfg
adding setup.py
$ hg commit -m "Start of the FooGIS project"

5. Install nose and coverage. Nose flattens the testing learning curve and coverage tells us how comprehensive our tests are.

$ cd ..
$ ./bin/pip install nose
Downloading/unpacking nose
...
Successfully installed nose
$ ./bin/pip install coverage
Downloading/unpacking coverage
...
Successfully installed coverage

6. Write some tests. The sooner we start testing, the better. Few things are more painful than writing tests a few hundred lines of code down the road.

$ cd foogis
$ vim foogis/tests.py

Here's the first test, taking advantage of nose's conventions for finding tests.

from foogis import Point

def test_foogis():
    assert Point(0.0, 0.0).x == 0.0

Nose lets you start testing immediately, avoiding the intricacies of unittest until you need them. Before we run the tests, we'll fully activate the virtual environment, adjusting executable paths so that we don't have to be explicit about them (It's true, as pointed out in comments, that we could have done this at the outset).

$ source ../bin/activate
$ which nosetests
/private/tmp/foo/bin/nosetests

Without any code, the tests fail, of course.

$ nosetests foogis
E
======================================================================
ERROR: Failure: ImportError (cannot import name Point)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/tmp/foo/lib/python2.6/site-packages/nose/loader.py", line 382, in loadTestsFromName
    addr.filename, addr.module)
  File "/private/tmp/foo/lib/python2.6/site-packages/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/private/tmp/foo/lib/python2.6/site-packages/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/private/tmp/foo/foogis/foogis/tests.py", line 1, in <module>
    from foogis import Point
ImportError: cannot import name Point

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

7. Write code and test.

$ vim foogis/__init__.py
class Point(object):
    def __init__(self, x, y):
        self.x = float(x)
        self.y = float(y)
    def __repr__(self):
        return 'Point (%s %s)' % (self.x, self.y)

Now, we run nosetests again with the coverage module:

$ nosetests --with-coverage foogis
.
Name     Stmts   Exec  Cover   Missing
--------------------------------------
foogis       6      5    83%   6
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

The tests pass, but we're missing a test of the __repr__ method on line 6. Let's add one.

$ vim foogis/tests.py
from foogis import Point

def test_foogis():
    assert Point(0.0, 0.0).x == 0.0
    assert repr(Point(0.0, 0.0)) == 'Point (0.0 0.0)'

and re-run the tests.

$ nosetests --with-coverage foogis
.
Name     Stmts   Exec  Cover   Missing
--------------------------------------
foogis       6      6   100%
----------------------------------------------------------------------
Ran 1 test in 0.002s

OK

8. Commit the changes and make a distribution.

$ hg add foogis/tests.py
$ hg commit -m "Added a Point class, with tests"
$ python setup.py sdist
running sdist
running egg_info
writing foogis.egg-info/PKG-INFO
writing top-level names to foogis.egg-info/top_level.txt
writing dependency_links to foogis.egg-info/dependency_links.txt
writing entry points to foogis.egg-info/entry_points.txt
reading manifest file 'foogis.egg-info/SOURCES.txt'
writing manifest file 'foogis.egg-info/SOURCES.txt'
creating foogis-0.1dev
creating foogis-0.1dev/foogis
creating foogis-0.1dev/foogis.egg-info
making hard links in foogis-0.1dev...
hard linking setup.cfg -> foogis-0.1dev
hard linking setup.py -> foogis-0.1dev
hard linking foogis/__init__.py -> foogis-0.1dev/foogis
hard linking foogis/tests.py -> foogis-0.1dev/foogis
hard linking foogis.egg-info/PKG-INFO -> foogis-0.1dev/foogis.egg-info
hard linking foogis.egg-info/SOURCES.txt -> foogis-0.1dev/foogis.egg-info
hard linking foogis.egg-info/dependency_links.txt -> foogis-0.1dev/foogis.egg-info
hard linking foogis.egg-info/entry_points.txt -> foogis-0.1dev/foogis.egg-info
hard linking foogis.egg-info/not-zip-safe -> foogis-0.1dev/foogis.egg-info
hard linking foogis.egg-info/top_level.txt -> foogis-0.1dev/foogis.egg-info
copying setup.cfg -> foogis-0.1dev
Writing foogis-0.1dev/setup.cfg
creating dist
tar -cf dist/foogis-0.1dev.tar foogis-0.1dev
gzip -f9 dist/foogis-0.1dev.tar
removing 'foogis-0.1dev' (and everything under it)

The file at dist/foogis-0.1dev.tar.gz is ready to be distributed to users of our package. Let's get them to install it using pip.

$ pip install dist/foogis-0.1dev.tar.gz
Unpacking ./dist/foogis-0.1dev.tar.gz
  Running setup.py egg_info for package from file:///private/tmp/foo/foogis/dist/foogis-0.1dev.tar.gz
Installing collected packages: foogis
  Running setup.py install for foogis
Successfully installed foogis

Do read the comment below about disabling (in setup.cfg) the "dev" tag in the distribution version string. Paste's "basic_package" template isn't the optimal template for every developer community. I'm familiar with the many additional features of the "ZopeSkel" template from Zope and Plone. I can imagine that commercial or semi-commercial efforts to grow Python developer communities (particularly thinking of ESRI here) might also be served well by specialized project templates.

Comments

see also

Author: Jonathan Hartley

Brilliant, many thanks. In particular I didn't realise coverage was so easy to use, and I've never got to grips with paster - I'll give it a try now.

Also, for people looking for more of the same, this nicely complements the following, which goes into more detail on some aspects:

http://infinitemonkeycorps.net/docs/pph/

Re: Bootstrapping a Python project

Author: Sean

Thanks, Jonathan. John Kleint's howto looks excellent, and does indeed cover important stuff that i skipped, such as how to write a readme and documentation.

Re: Bootstrapping a Python project

Author: Tarek Ziadé

Nice article !

One minor point: I would not put the *.egg-info dir under revision control. It's a generated content that will change all the time, and that is not required when people get the source from the repository.

Btw: Would you be interested to include this document in the HitchHicker's guide to packaging ? (which is planned to be included in the official docs.python.org at some point) I think it's a great help for people to get started.

Re: Bootstrapping a Python project

Author: Sean

Sure, Tarek, after a little more feedback it might be worth including. I always try to avoid committing the egg-info, or remove it soon after accidentally committing it. The --exclude option (now used above) is handy.

dev releases

Author: Kevin Teague

It's also worth mentioning that setup.cfg needs to be edited before a release is made so that there is no "tag_build = dev" in the [egg_info] section (and you can get rid of the "tag_svn_revision = true" if you aren't using SVN). User's generally shouldn't be installing dev releases since they are unversioned.

Personally, I get rid of the setup.cfg and put append dev inside the setup.py file, so that there is one less file to have to think about. Opinions on this file go both ways though ...

http://philikon.wordpress.com/2008/06/05/setupcfg-considered-harmful/

Re: Bootstrapping a Python project

Author: Brian

Install paster if it wasn't already installed under our original Python (or if you used the --no-site-packages option)

Does pip respect http_proxy? I'm pretty sure urllib2 supports the http_proxy variable, so I imagine it probably does...right?

Re: Bootstrapping a Python project

Author: Marius Gedminas

Good article!

Personally I use zc.buildout instead of virtualenv: so it's one less directory level to handle, and the sandbox creation can be automated (which is useful for other people who want to check out your source tree and start working on it). It has downsides (another config file in your tree; poor documentation) and upsides (common 3rd packages are can be shared between many projects, which speeds up downloading/installing of new environments and is a killer feature in my book).

One important point that bit me recently: nose --with-coverage remembers coverage results from previous runs, so if you change something and re-run, you won't know your actual coverage. *Always* use nosetests --with-coverage --cover-erase.

Re: Bootstrapping a Python project

Author: Benjamin Sergeant

This foogis package looks very cool, can I download it somewhere ?

;) ... I'm in the April's fool mood ... didn't know about repr and pastescript, good stuff.

Re: Bootstrapping a Python project

Author: srid

Do take a look at modern-package-template as an alternative for basic_package: http://pypi.python.org/pypi/modern-package-template

de9im: DE-9IM utilities

As part of my continuing education about the theory and methods underlying Shapely, GEOS, JTS, and the OGC's Simple Features specs, I've written a small package of utilities for working with DE-9IM [1] matrices and patterns: http://bitbucket.org/sgillies/de9im/. Shapely provides the standard predicates (these are probably my favorite OGC standards) as geometry class methods,

>>> from shapely.wkt import loads
>>> p = loads('POLYGON ((1.0 0.0, 0.0 -1.0, -1.0 0.0, 0.0 1.0, 1.0 0.0))')
>>> q = loads('POLYGON ((3.0 0.0, 2.0 -1.0, 1.0 0.0, 2.0 1.0, 3.0 0.0))')
>>> p.disjoint(q)
False
>>> p.intersects(q)
True
>>> p.touches(q)
True

but what if you wanted to test whether the features touched at exactly one point only? A "side hug", you might say. Instead of computing the intersection and checking its geometry type, you can use the de9im package to define a DE-9IM matrix pattern and test it against the relation matrix for the two features. The 0 in the pattern below requires that the intersection of the boundaries of the features be a 0-dimensional figure. In other words: a point.

>>> from de9im import pattern
>>> side_hug = pattern('FF*F0****')
>>> im = p.relate(q)
>>> print im
FF2F01212
>>> side_hug.matches(im)
True

The de9im package is 100% tested, which gives me a good starting point for experimenting with more optimal implementations.

There seems to be almost enough standardization between GeoDjango, Shapely, and SQLAlchemy that I could make these patterns callable, and call the relate method on a pair of objects:

class Pattern(object):
    def __call__(self, a, b):
        return self.matches(a.relate(b))

to use like so:

>>> side_hug(p, q)
True

Comments

reading list

Author: Jonathan Hartley

Hey Sean,

Recently I think you published a small reading list of geometric papers and the like on topics such as this, and now I can't find it again. Did I imagine this? Can you point me to it? Thanks.

Jonathan

Re: de9im: DE-9IM utilities

Author: Sean

It's http://www.zotero.org/groups/geography-and-computing. If you're interested in adding references, let me know.

Re: de9im: DE-9IM utilities

Author: Jonathan Hartley

Thanks! That link once again, without the terminating period included:

http://www.zotero.org/groups/geography-and-computing

RESTful hypermedia agents

Stuart Charlton says stuff about Building a RESTful Hypermedia Agent, Part 1:

Building a hypermedia-aware client is rather different from building a typical client in a client/server system. It may not be immediately intuitive. But, I believe the notions are rooted in (quite literally) decades of experience in other computing domains that are agent-oriented. Game behaviour engines, control systems, reactive or event-driven systems all have been developed with this programming approach in mind.

He points to a diagram from Artifical Intelligence: A Modern Approach [1] and adapts it to the RESTful web. Agent sensors become HTTP's "safe" methods (GET), effectors the "unsafe" methods (POST, PUT, DELETE). The HTTP protocol and content type definitions make up the agent's model of the evolving, mutable state of its environment (the web). I'm enjoying thinking about RESTful client-server interactions on the web in these well-reasoned terms. The REST style, to me, is all about enabling software agents. Not just web browsers or search index crawlers, but agents that might mine your texts, geocode your news, or ETL your spatial data (and wash your socks). I'm looking forward to the next installment.

Origin of the multi-geometry

I'm trying to track down the origin of the geometry collection concept. Did it originate inside or outside GIS? JSTOR has nothing for me. Via Martin Davis, I've found a paper by Egenhofer and Herring [1] [PDF] which mentions a "complex line" on page 6:

– A complex line is a line with more than two disconnected boundaries (Figure 1d).

but makes no mention of multipoints, multilines, or multipolygons. Figure 1d in that paper shows a forking line, like a lower-case Greek lamba: λ.

Anybody have a good reference?

Comments

Most likely GIS

Author: GIS oriented

How else would you represent the multiple islands of Hawaii or states like Michigan as a single shape?

Re: Origin of the multi-geometry

Author: Sean

Sure, but where did the concept first crop up? In a paper? In software?

Re: Origin of the multi-geometry

Author: Sean

Another read of that paper makes me think there may be a germ of geometry collections in "cell complexes".

Re: Origin of the multi-geometry

Author: Martin Davis

Cell complexes are quite different to Multi-geometry. They are a fairly deep topological concept.

To turn the question around - why do you care? I suspect any "origin" you come up with won't be all that interesting, since Multi-geometry is just a fairly obvious extension to single geometries, so it's likely to have been invented numerous times (how long have shapefiles been around?)

Sensors, things, and the Web

My readers are probably aware of the OGC's Sensor Web initiatives, but there's another, different vision of a "Web of Things" using the architecture and infrastructure of the actual web we have now (URIs, HTTP, Atom, JSON, HTML, Javascript) that's well articulated in this SXSW presentation by Vlad Trifa and Dominique Guinard (and also in their blog, via This week in REST) and in an associated technical report (PDF).

Comments

Re: Sensors, things, and the Web

Author: Miguel Montesinos

I deeply agree with the "Web of Things" vision. I don't think that's the "different vision", but one of the most likely visions to happen.

I'm involved in several similar European R&D projects and initiatives, and none of them cares about OGC's SWE, apart from what we are proposing.

SWE is quite powerful, but really complicated for becoming a backbone of the Internet of Things.

Miguel

ISAW Visit

Last week I made my first 2010 trip to the Institute for the Study of the Ancient World at New York University for a workshop with researchers and programmers from the University of Heidelberg's Epigraphische Datenbank Heidelberg. The stuff I work on daily is only a fraction of ISAW's digital projects, which are in turn only a fraction of ISAW's business. I had a day before the workshop to catch up with what's going on in the ISAW library and exhibition groups.

But first I had to fly across the Atlantic in this A380, which flies as smoothly and quietly as advertised.

http://farm3.static.flickr.com/2772/4418841703_9600d1e379_d.jpg http://farm5.static.flickr.com/4040/4418841705_587a41402b_d.jpg

I shared a row with a retiree from Avignon. His US-based kids were flying him to NYC and then Aspen for his 60th birthday. We talked about the food and geography of France and the Southwest US – Mexican cuisine in particular, which I've been craving and he'd discovered on a previous trip to Colorado, Utah, and Arizona. I heard French on the street in New York, and among visiting scholars at ISAW, but that would be the last French I'd speak for a week.

Next is a crappy mobile phone photo of one of the fine banners ISAW put up on the Museum Mile (ISAW is just half a block east of 5th Ave on 84th Street) to advertise the Old Europe exhibit.

http://farm5.static.flickr.com/4055/4418841707_91de188dcd_d.jpg

In this context, "Old Europe" refers to a largely forgotten Neolithic and Copper Age culture established along the Danube River during a wave of emigration from Anatolia that also settled the Aegean islands and what are now Macedonia and Greece (see also Cucuteni-Trypillian culture). The objects in the exhibition are from museums in Bulgaria, Moldova, and Bulgaria, and are being shown in the US for the first time. Here's a nice Flickr photo set made by an exhibit visitor:

http://farm3.static.flickr.com/2793/4418996097_a2d03cd10e_d.jpg

The Metropolitan Museum of Art has nothing from this culture, but does have a collection of almost contemporary Early Cycladic objects.

The exhibit is very well done, widely reported and well reviewed in the local media, and well attended. Chapeau to Jennifer Chi and the exhibition team. There's a nice catalog book edited by David W. Anthony with chapters that dive deeper into the archaeology and history of the culture. The exhibits and digital projects groups have a bunch of ideas of how to improve the integration of physical, print, and web materials for upcoming exhibits as we roll out the new ISAW website. The catalog has some great maps by Brian Turner from UNC's Ancient World Mapping Center (where I worked previously) but I think a KML application could take the geography to another level. The exhibit runs through April 26 after which you'll have to travel to Southeastern Europe to see these objects.

I met ISAW's newest technical people, Michael Edgcumbe (who keeps the wheels on office computing) and Christopher Warner (lead on the new website), in person for the first time. The workshop went well, too. More about that later after I push new code up to our site.

While packing for the return trip, I heard that I'd be coming back to snow. Indeed: when we descended below the clouds I saw Montpellier and much of the Hérault department covered with snow. We got about 10 cm in the neighborhood, some of which remains to be seen in the photo below:

http://farm5.static.flickr.com/4014/4416631557_cb66eca519_d.jpg

Prunus dulcis

Normally, turkey vultures inform me of Spring's arrival; I'm looking around for other signs here in Montpellier. Returning yesterday from a week in the snowy Alpes, I saw trees in full bloom along the A9 near Nîmes. Perhaps we had slightly cooler weather here last week, because the almond trees in the neighborhood green space are just getting started:

http://farm5.static.flickr.com/4021/4394353543_76effcfcc7_d.jpg

PyCon interview with Sanjiv Singh

Here's an interview with Sanjiv Singh (TurboGears, GeoAlchemy) in The Bitsource:

Sanjiv Singh is a Python developer from New Delhi, India who contributes to the TurboGears and Toscawidgets projects. Singh recently gave a talk on “TurboGears2 Geopspatial Framework at PyCon 2010, which is a framework for developing Geographic Information System Applications, such as the Blind Audio Tactile System.

I only found two PyCon talks related to GIS: Sanjiv's TurboGears Geospatial Framework and Roy Hyunjin Han's How Python is guiding infrastructure construction in Africa.

Saving bandwidth and more using httplib2

Here's a comment that more properly belongs on Saving bandwidth using Python. The requirement to register was (to me) a blocker for leaving it there. Anne writes:

When Internet connection is a limited resource, a well-designed website doesn’t perform multiple times the same request. This little adjustment can significantly reduce the time required to load and refresh a page. First-world programmers should keep this in mind, or better come to South Africa and experience it in person ...

The solution involves a wrapper around urllib.urlretrieve that partially implements HTTP caching. A more robust solution might instead use the almost transparent Last-Modified or ETag validated caching that is built into httplib2. See also Mark Pilgrim's notes on httplib2 in Dive Into Python 3 (httplib2 works fine with Python 2.3+). Saves bandwidth, development time, and bug chasing.

Comments

Re: Saving bandwidth and more using httplib2

Author: Barry Rowlingson

Or set up a local, or institutional, caching proxy using squid and apache? Most programs these days can be configured to use them, either via setting http_proxy as an environment variable or custom config settings.

Then you get to set timeouts and E-tag magic and it can work for everyone on your PC/at your institution for all HTTP content. Big wins all round.

Re: Saving bandwidth and more using httplib2

Author: Sean

Better yet, I agree, for an enterprise. I assumed that the Linfinti work had a smaller scope.

Re: Saving bandwidth and more using httplib2

Author: Tim Sutton

Actually Anne left out a little bit of the story - we are writing a web mapping client that builds a legend using wms GetLegendGraphic requests to a third party service. So the client is on one (unknown) network where we cant impose squid etc, the server on another and the wms service on a 3rd. The problem is that each time the client requests a page that has a legend on it, they wait for time consuming getlegendgraphic requests. So our solution is to cache the legend graphic requests on our server so that we can give a known response time to the web client. In this context squid / proxying requests wouldnt be an option.

Regards

Tim