2014

Rasterio 0.15 and a cheat sheet

Here is what's new in Rasterio 0.15. The biggest changes are the ones under the hood to permit opening non-TIFF formats in 'r+' and 'w' modes. The one API change was made to align better with Numpy: any output keyword args are superceded by out and we warn you about future removal of output. In the command line programs we're adding -f and --format as preferred aliases for the older --driver option. We're closing in on the programming and command line interfaces that will be finalized in 1.0.

Inspired by Derek Watkins, I've begun a Fiona/Rasterio/Shapely cheat sheet modeled after his popular GDAL/OGR command line cheat sheet. It's been a great rubric for identifying the key features that should be in the Fiona and Rasterio CLIs. It also has fun examples of using fio and rio with GNU Parallel, jq, and geojsonio-cli.

$ fio cat input.shp --x-json-seq-no-rs \
> | parallel --pipe "jq -c 'select(.id==\"10\")'" \
> | fio collect \
> | geojsonio

0.15 features in the cheat sheet include version inspection,

$ rio --version
0.15

format driver enumeration,

$ rio env --formats
AAIGrid: Arc/Info ASCII Grid
ACE2: ACE2
ADRG: ARC Digitized Raster Graphics
AIG: Arc/Info Binary Grid
ARG: Azavea Raster Grid format
AirSAR: AirSAR Polarimetric Image
...
ZMap: ZMap Plus Grid

and stacking raster bands to produce new multiband datasets.

$ rio stack tests/data/RGB.byte.tif --bidx 1..3 -o stacked.jpg -f JPEG

Unix style spatial ETL with fio cat, collect, and load

In Fiona 1.4.0 I added a fio-cat command to the CLI which works much UNIX cat. It opens one or more vector datastets, concatenating their features and printing them to stdout as a sequence of GeoJSON features.

$ fio cat docs/data/test_uk.shp | head -n 2
{"geometry": {"coordinates": [...], "type": "Polygon"}, "id": "0", "properties": {"AREA": 244820.0, "CAT": 232.0, "CNTRY_NAME": "United Kingdom", "FIPS_CNTRY": "UK", "POP_CNTRY": 60270708.0}, "type": "Feature"}
{"geometry": {"coordinates": [...], "type": "Polygon"}, "id": "1", "properties": {"AREA": 244820.0, "CAT": 232.0, "CNTRY_NAME": "United Kingdom", "FIPS_CNTRY": "UK", "POP_CNTRY": 60270708.0}, "type": "Feature"}

I've replaced most of the coordinates with ellipses to save space in the code block above, something I'll continue to do in examples below.

I said that fio-cat concatenates features of multiple files and you can see this by using wc -l.

$ fio cat docs/data/test_uk.shp | wc -l
      48
$ fio cat docs/data/test_uk.shp docs/data/test_uk.shp | wc -l
      96

If you look closely at the output, you'll see that every GeoJSON feature is a standalone text and each is preceded by an ASCII RS (0x1E) control character. These allow you to cat pretty-printed GeoJSON (using the --indent option) containing newlines that can still be understood as a sequence of texts by other programs. Software like Python's json module and Node's underscore-cli will trip over unstripped RS, so you can disable the RS control characters and emit LF delimited sequences of GeoJSON (with no option to pretty print, of course) using --x-json-seq-no-rs.

To complement fio-cat I've written fio-load and fio-collect. They read features from a sequence (RS or LF delimited) and respectively write them to a formatted vector file (such as a Shapefile) or print them as a GeoJSON feature collection.

Here's an example of using fio-cat and load together. You should tell fio-load what coordinate reference system to use when writing the output file because that information isn't carried in the GeoJSON features written by fio-cat.

$ fio cat docs/data/test_uk.shp \
| fio load --driver Shapefile --dst_crs EPSG:4326 /tmp/test_uk.shp
$ ls -l /tmp/test_uk.*
-rw-r--r--  1 seang  wheel     10 Oct  5 10:09 /tmp/test_uk.cpg
-rw-r--r--  1 seang  wheel  11377 Oct  5 10:09 /tmp/test_uk.dbf
-rw-r--r--  1 seang  wheel    143 Oct  5 10:09 /tmp/test_uk.prj
-rw-r--r--  1 seang  wheel  65156 Oct  5 10:09 /tmp/test_uk.shp
-rw-r--r--  1 seang  wheel    484 Oct  5 10:09 /tmp/test_uk.shx

And here's one of fio-cat and collect.

$ fio cat docs/data/test_uk.shp | fio collect --indent 4 | head
{
    "features": [
        {
            "geometry": {
                "coordinates": [
                    [
                        [
                            0.899167,
                            51.357216
                        ],
$ fio cat docs/data/test_uk.shp | fio collect --indent 4 | tail
                "CAT": 232.0,
                "CNTRY_NAME": "United Kingdom",
                "FIPS_CNTRY": "UK",
                "POP_CNTRY": 60270708.0
            },
            "type": "Feature"
        }
    ],
    "type": "FeatureCollection"
}

Does it look like I've simply reinvented ogr2ogr? The difference is that with fio-cat and fio-load there's space in between for programs that process features. The programs could be written in any language. They might use Shapely, they might use Turf. The only requirement is that they read and write sequences of GeoJSON features using stdin and stdout. A nice property of programs like these is that you can sometimes parallelize them cheaply using GNU parallel.

The fio-buffer program (unreleased) in the example below uses Shapely to calculate a 100 km buffer around features (in Web Mercator, I know!). Parallel doesn't help in this example because the sequence of features from fio-cat is fairly small, but I want to show you how to tell parallel to watch for RS as a record separator.

$ fio cat docs/data/test_uk.shp --dst_crs EPSG:3857 \
> | parallel --pipe --recstart '\x1E' fio buffer 1E+5 \
> | fio collect --src_crs EPSG:3857 \
> | geojsonio

Here's the result. Unix pipelines, still awesome at the age of 41!

The other point of this post is that, with the JSON Text Sequence draft apparently going to publication, sequences of GeoJSON features not collected into a GeoJSON feature collection are very close to being a real thing that developers should be supporting.

Python at FOSS4G 2014

There were plenty of other Python talks at FOSS4G and I plan to watch them when the videos are online (update: talks are appearing now at http://vimeo.com/foss4g). I haven't been aware of ogrtools, which is unlucky because there's plenty of functional overlap between it and Fiona. The designs seem rather different because Fiona doesn't emulate XML tool chains (GDAL's VRTs are not unlike XSLT) and is more modular. For example, where ogrtools has a file-to-file ogr translate command, Fiona has a fio dump and fio load pair connected by a stream of GeoJSON objects. The ogrtools talk is right near the top of my list of talks to see.

I was very fortunate to go right after Mike Bostock's keynote. It got people thinking about tools and design, and that's exactly the conversation that I'm trying to engage developers in with Fiona and Rasterio, if with less insight and perspective than Mike. I reminded attendees that the best features of our day-to-day programming languages are sometimes disjoint and showed this diagram (in which C is yellow, Javascript is magenta, and Python is blue. By "GC" I mean garbage collection and by "{};" I mean extraneous syntax).

https://sgillies.github.io/foss4g-2014-fiona-rasterio/img/py-js-c.png

D3 embraces browser standards and all they entail (a world wide knowledge base and continuous performance improvements) and Fiona and Rasterio embrace the good parts of Python. Written as C, like we usually see in GDAL/OGR examples on the web, Python is quite slow. Idiomatic Python, including the good parts like list comprehensions, generators, and iterators, is dramatically faster. While Fiona and Rasterio don't do particular operations faster than the older GDAL and OGR bindings (because it's the same C library underneath), they are designed from the bottom up for a good fit with more efficient idiomatic Python code.

I plugged Click and Cython in my talk, too, and discussed them afterwards. I found tons of interest in Python at FOSS4G and lots of good ideas about how to use it.

I confess that I didn't pay a lot of attention to the talk schedule before the conference. My summer was kind of nuts and I don't subscribe to any OSGeo lists. When I did look closely I was surprised to find that many people were giving two talks and some three. If any woman or first-timer didn't get a chance to speak while some dude got three (and the multiple talkers were all men and long time attendees as far as I can tell) – that's a bug in the talk selection that needs to be fixed before the next edition.

Lastly, I think the views of Mount Hood you get when flying in and out of PDX to destinations south and east are worth the airfare all by themselves.

https://farm6.staticflickr.com/5587/15249959145_91e47b3444_c_d.jpg

Back from FOSS4G

In my experience, FOSS4G was tons of fun and very well run. Chapeau to the organizing team! I hope other attendees got as much out of the conference as I did. Not only did I get to catch up with people I met at the dawn of FOSS4G, I met great people I'd only known from Twitter and made entirely new acquaintances. I even got to speak a bit of French.

My talk was one of the first in the general sessions. I had fun presenting and am told that I did a good job. My slides are published at http://sgillies.github.io/foss4g-2014-fiona-rasterio/ and you can fork them from GitHub. According to the information at the FOSS4G Live Stream page all the talks will be available online soon. I missed plenty that I'm looking forward to seeing on my computer. Out of the ones I attended, I particularly recommend seeing the following:

  • "Using OpenStreetMap Infrastructure to Collect Data for our National Parks" by James McAndrew, National Park Service
  • "Managing public data on GitHub: Pay no attention to that git behind the curtain" by Landon Reed, Atlanta Regional Commission
  • "Big (enough) data and strategies for distributed geoprocessing" by Robin Kraft, World Resources Institute
  • "An Automated, Open Source Pipeline for Mass Production of 2 m/px DEMs from Commercial Stereo Imagery" by David Shean, University of Washington

Did the code of conduct work? I heard one speaker invoke images of barely competent moms – "so easy your mother can do it" – and was present for a unfortunate reference to hacking private photos at lunch time. I hope that was all of it.

If you attended FOSS4G or watched the live feed I encourage you to write about your experience and impressions. Come on, do it. It doesn't have to be long or comprehensive. Here are a few blog posts I've seen already:

Fiona and Rasterio releases

Like everyone else, I'm making releases before FOSS4G. Fiona 1.2 has a bunch of bug fixes and new features (contributed largely by René Buffat) and Rasterio 0.12 has new CLI commands and options. I'll be talking about these packages and their design and use first thing Wednesday morning (September 10) at FOSS4G. I've also got some things to say about Python programming and geographic data that are not specific to Fiona and Rasterio.

The big deal, however, will be the release of Shapely 1.4 on September 9. This is the first version with major new features since the project made the jump to Python 3. There will be quite a lot of new stuff in 1.4 including better interaction with IPython Notebooks, vectorized functions, an R-tree, and lots of speedups. It's been a group effort largely motivated by development of visualization and analytic frameworks: Cartopy and GeoPandas. Joshua Arnott and Jacob Wasserman in particular have been putting a lot of time into making Shapely better and faster over the past couple of weeks. If you're a Shapely user, please do something nice for these two the next time you see them.

Pruning CRS from GeoJSON

I uploaded version 4 of the GeoJSON I-D to the IETF's tracker yesterday. It contains a major change to section 3. In version 3, the draft contained more or less the same text as in http://geojson.org/geojson-spec.html, but version 4 declares that coordinate reference systems other than the default are not recommended and that means of describing them, including the CRS object of the original 2008 spec, are now application specific concerns. In other words, if you want projected coordinates in the GeoJSON that travels between the front and back ends of your web app you're on your own. Furthermore, you're doing it wrong if you publish this projected GeoJSON to the open web and expect processors to have access to an EPSG database.

I've been watching the IETF JSON Working Group's JSON, I-JSON, and JSON Sequence discussions closely while revising the GeoJSON I-D. Version 4 treats CRS like RFC 7159 treats character encoding, acknowledging other coordinate reference systems while making a very strong recommendation for using the default CRS. You could say CRS84 is our UTF-8. Version 4 also requires that coordinates not be ordered latitude, longitude. Lat/lng is like our byte order mark.

Removing the CRS object description from the draft has been a goal of mine from the start. Its poor design has been a distraction and it never was as useful to developers as we intended. The GeoJSON draft is better without it. I get the impression that some standards people will see its removal from the draft as a void to be filled. CRS wonks gotta wonk, I suppose, but do developers care very much that there is no JSON equivalent of <gml:ProjectedCRS>? I don't think so.

SciPy Conference

I'm back from my first ever trip to SciPy, the annual scientific Python community conference. I found it quite amazingly good.

https://conference.scipy.org/scipy2014/site_media/static/img/scipy2014_logo_simple.png

Slides from my talk on Wednesday are at https://sgillies.github.io/scipy-2014-rasterio/. I feel like it went well and hope that you found (and will find it) useful, too. I missed Tuesday's sessions, which were jam packed with geospatial talks, and didn't have time to check out all the slides from those before presenting. Referencing other talks is something I usually try to do, and failing to do so felt a little weird. My talk was sandwiched between Tyler Erickson's on Google Earth Engine and Shawn Walbridge's on workflows and distribution of GIS Science toolkits; a nice showing of research and science going on at Google, Mapbox, and Esri.

The IPython Notebook seemed to be the major touchstone for presenters and other attendees of the conference, and rightly so. I've got all kinds of plans for doing things with it, many that I expressed at the conference, and I'm certain that it's going to spread in GIS circles. Features coming soon to notebooks near you include: map interactivity, Native Client notebooks, Google Drive hosted notebooks, and a new language agnostic platform. I enjoyed getting to meet Fernando Perez and Brian Granger and congratulate them on the success of the project.

Greg Wilson's keynote, available on Youtube, was a challenge to apply science to the teaching of programming and other subjects. I definitely recommend watching it. A strange recommendation from me, I know, since I'm more likely to say things like "keynotes are terrible." Keynotes that stroke the audience are terrible. Wilson's is better than that.

Best part of the trip for me was sprinting on GeoPandas and other projects with Kelsey Jordahl, Nora Deram, Jacob Wasserman (remotely), Matt Perry, Taylor Oshan, Carson Farmer, Shawn Walbridge, Serge Ray, and Philip Stephens. I learned a ton about Pandas and GeoPandas, made some solid contributions to the project, and all in great company. It was a pleasure and a privilege.

One sobering thing is being reminded that not everyone enjoys the same pleasure and privilege. It's important to read and think about April Wright's thoughts on SciPy: http://wrightaprilm.github.io/posts/lonely.html.

Steady as she goes

Standards organizations are increasingly interested in the little format that could and if http://www.w3.org/2014/05/geo-charter is any indication it's likely that you'll be seeing more mention of GeoJSON and GeoJSON-LD in OGC and W3C materials soon. However, the GeoJSON working group (for which I'm speaking in just this and the following sentence), is going to keep its own independent course. Work on the GeoJSON Internet Draft will continue to be done at https://github.com/geojson/draft-geojson and on the GeoJSON discussion list.

Now, Mapbox has recently joined the OGC and I've subsequently received emails from OGC members seeking discussion of GeoJSON. This is exactly the thing that I want to avoid and I will decline to do so outside of the GeoJSON venues I listed above. I hope you will, too, because discussions of GeoJSON on closed OGC technical committee lists or private emails with other OGC members aren't going to do the GeoJSON format, I-D, or users any good. GeoJSON is for the open web. Let's continue to tend it on the open web.