RFC 8142 is the second and final
deliverable of the IETF's GeoJSON working group. It standardizes sequences of
GeoJSON texts and and a media type you can use to tell receivers "here comes
a sequence of GeoJSON Feature objects, not a GeoJSON FeatureCollection." This
is useful because a GeoJSON feature collections must be read in its entirety
before it can be parsed . It's a blob. A text – not binary – blob, but
a blob nonetheless. A FeatureCollection becomes unwieldy as the number of
features increases. Dynamic feed-like streams of features (consider a stream of
OSM edits or stream of features extracted in real time from imagery) also need
a different kind of representation from a static array of Feature objects.
Newline-delimited sequences of GeoJSON objects are being employed by some
projects, including a few at Mapbox. In a newline-delimited sequence the
individual features must use a compact form. No pretty-printed features are
permitted. If you're aggregating features produced by other services, you must
parse them and reserialize them in compact form.
RFC 8142 describes a format for sequences of features that may be compact or
pretty-printed. Mixed sequences are also possible. The trick is that every
sequence item must begin with an ASCII Record Seperator (RS), 0x1E, and end
with a newline. Two delimiters. The first allows formatted, pretty-printed
texts within a sequence, the latter guards against truncated sequence records.
That's it. There's not a lot to RFC 8142 other than this and the definition of
a new internet media type to mark this kind of data stream.
Sprinkling RS in your file sort of turns it into a binary file. Python's
open() function, for example, does not accept newline=u'\x1e' and can
not provide you an iterator over RS-delimited records. You may have to write
your own readLine() type of function to get individual items from the
stream. It's not the end of the world, but does add some friction. Vladimir
Agafonkin tells me that this is the way to do it in JavaScript:
var split = require('binary-split');
fs.createReadStream('data.foo')
.pipe(split('\x1E'))
.on('data', function (buf) {
var geojson = JSON.parse(buf.toString());
});
There is already support for GeoJSON text sequences in programs that I use
often like GNU Parallel, jq, and fio. In parallel's --pipe mode, the
--recstart option will split records on RS and --rrs will remove the RS
from the output.
parallel --pipe --rrs --recstart '\x1E' cat < data.jsonseq
The current version of jq, 1.5, will read and write RS-delimited sequences if
you pass the --seq option.
jq --seq -c '.' data.jsonseq
Fiona's fio-cat will emit RS if
you use its --rs option. This is required if you want pretty-printed
features. Otherwise fio-cat writes compact GeoJSON delimited only by newlines.
The complementary fio-collect and fio-load commands accept either
newline-delimited sequences or GeoJSON text sequences.
Note that there's no recommended file extension for GeoJSON text sequences.
The format is intended for network protocols and not for files. If you do save
them to files it would be best not to use .json or .geojson as an extension
because a delimited sequence of GeoJSON (RS or not) isn't valid JSON.
Note also that while the format technically allows mixed sequences containing
GeoJSON FeatureCollection, Feature, and Geometry objects, the semantics of
these kinds of mixed sequences is unlikely to be understood by consumers.
Streams of features seems to me like the best application for this format right
now.
Thanks to the following people and organizations: Eric Wilde and Martin
Thomson, the WG chairs; Alissa Cooper, Area Director; the RFC Editor and IETF
reviewers; Mark Baker, Sean Leonard, and Ned Freed for comments on the media
type; WG participants Martin Daly, Stephan Drees, Kevin Wurster, Matthew Perry,
,Allan Doyle, Carl Reed, Jerry Sievert, Peter Vretanos, and Howard Butler; and
Mapbox, my employer, for allowing me time to edit the doc.