RFC 8142: GeoJSON Text Sequences

Sean Gillies

2017-05-18 10:22

RFC 8142 is the second and final deliverable of the IETF's GeoJSON working group. It standardizes sequences of GeoJSON texts and and a media type you can use to tell receivers "here comes a sequence of GeoJSON Feature objects, not a GeoJSON FeatureCollection." This is useful because a GeoJSON feature collections must be read in its entirety before it can be parsed [1]. It's a blob. A text – not binary – blob, but a blob nonetheless. A FeatureCollection becomes unwieldy as the number of features increases. Dynamic feed-like streams of features (consider a stream of OSM edits or stream of features extracted in real time from imagery) also need a different kind of representation from a static array of Feature objects.

Newline-delimited sequences of GeoJSON objects are being employed by some projects, including a few at Mapbox. In a newline-delimited sequence the individual features must use a compact form. No pretty-printed features are permitted. If you're aggregating features produced by other services, you must parse them and reserialize them in compact form.

RFC 8142 describes a format for sequences of features that may be compact or pretty-printed. Mixed sequences are also possible. The trick is that every sequence item must begin with an ASCII Record Seperator (RS), 0x1E, and end with a newline. Two delimiters. The first allows formatted, pretty-printed texts within a sequence, the latter guards against truncated sequence records. That's it. There's not a lot to RFC 8142 other than this and the definition of a new internet media type to mark this kind of data stream.

Sprinkling RS in your file sort of turns it into a binary file. Python's open() function, for example, does not accept newline=u'\x1e' and can not provide you an iterator over RS-delimited records. You may have to write your own readLine() type of function to get individual items from the stream. It's not the end of the world, but does add some friction. Vladimir Agafonkin tells me that this is the way to do it in JavaScript:

var split = require('binary-split');

fs.createReadStream('data.foo')
.pipe(split('\x1E'))
.on('data', function (buf) {
    var geojson = JSON.parse(buf.toString());
});

There is already support for GeoJSON text sequences in programs that I use often like GNU Parallel, jq, and fio. In parallel's --pipe mode, the --recstart option will split records on RS and --rrs will remove the RS from the output.

parallel --pipe --rrs --recstart '\x1E' cat < data.jsonseq

The current version of jq, 1.5, will read and write RS-delimited sequences if you pass the --seq option.

jq --seq -c '.' data.jsonseq

Fiona's fio-cat will emit RS if you use its --rs option. This is required if you want pretty-printed features. Otherwise fio-cat writes compact GeoJSON delimited only by newlines. The complementary fio-collect and fio-load commands accept either newline-delimited sequences or GeoJSON text sequences.

Note that there's no recommended file extension for GeoJSON text sequences. The format is intended for network protocols and not for files. If you do save them to files it would be best not to use .json or .geojson as an extension because a delimited sequence of GeoJSON (RS or not) isn't valid JSON.

Note also that while the format technically allows mixed sequences containing GeoJSON FeatureCollection, Feature, and Geometry objects, the semantics of these kinds of mixed sequences is unlikely to be understood by consumers. Streams of features seems to me like the best application for this format right now.

Thanks to the following people and organizations: Eric Wilde and Martin Thomson, the WG chairs; Alissa Cooper, Area Director; the RFC Editor and IETF reviewers; Mark Baker, Sean Leonard, and Ned Freed for comments on the media type; WG participants Martin Daly, Stephan Drees, Kevin Wurster, Matthew Perry, ,Allan Doyle, Carl Reed, Jerry Sievert, Peter Vretanos, and Howard Butler; and Mapbox, my employer, for allowing me time to edit the doc.