Discount WFS Source for MapServer

How do I get my data into MapServer? Often, new MapServer users have a large legacy dataset that is not in a supported format. This might be a CSV file, a MySQL database, or a proprietary binary format. An overlooked solution to this problem involves WFS (Web Feature Service), the closest thing MapServer has to a generic data source. A Web Feature Service responds to requests for features with a XML document that contains features and attributes encoded using GML, the Geography Markup Language. There are a number of full-featured WFS, commercial and open source, but it is remarkably easy to roll your own that can serve as a lightweight MapServer data source.

Let's look at the simple case of publishing via WFS the data in a test CSV file. Granted, this is not the most compelling case, but is easy to reproduce and a good starting point. Consider a file with a header row and six point feature rows:

"FID:int","FNAME:string","SCORE:float","X:float","Y:float"
1,"Point A",0.2,-110.5,44.3
2,"Point B",0.4,-106.7,37.9
3,"Point C",0.15,-109.5,41.6
4,"Point D",0.6,-108.1,42.5
5,"Point E",0.8,-107.4,39.8
6,"Point F",0.55,-108.6,38.2

Each feature has an integer ID, a name, a float type score attribute, and, as is often the case with legacy or non-GIS data, coordinates as separate x and y values.

I will render this into XML using ElementTree, a flexible module for programming hierarchical data structures. The reading of data, and creation and arrangement of elements will be handled by a CSVFeatureCollection class. A toxml() method will return a WFS feature collection encoded as an XML string, and be used as the core of a response to a GetFeature request. To keep the example as simple as possible, I will ignore all WFS request parameters; this cheap-o WFS returns every feature from the data source. Because MapServer is a simple and forgiving WFS client, I can also neglect the GetCapabilities and DescribeFeatureType responses required of a full-featured WFS.

Since the WFS response makes much use of qualified names for element tags, I first define two functions for convenience:

# begin module csvsource.py

import csv
import elementtree
from elementtree.ElementTree import Element, QName, tostring

# Convenience wrappers for the Element factory
# --------------------------------------------

def WFSElement(tag):
    e = Element(tag)
    e.tag = QName('http://www.opengis.net/wfs', e.tag)
    return e

def GMLElement(tag):
    e = Element(tag)
    e.tag = QName('http://www.opengis.net/gml', e.tag)
    return e

Create the root element, defining the required WFS and GML namespaces, and creating the element which describes the bounding box for features in the response. One might use mapscript or the OGR Python module to compute the proper extents. In the interest of simplicity, I'm taking a shortcut and hard-coding the bounding box:

class CSVFeatureCollection:

    """Express the 'points.csv' file as a WFS Feature Collection"""

    def toxml(self):
        """Return an XML string, sans prolog"""
        # root element
        e_root = WFSElement('FeatureCollection')
        e_root.attrib['xmlns'] = 'http://localhost/myns'
        e_root.attrib[QName('http://localhost/myns', 'wfs')] \
                   = 'http://www.opengis.net/wfs'
        e_root.attrib[QName('http://localhost/myns', 'gml')] \
                   = 'http://www.opengis.net/gml'

        # bounds
        e_bounds = GMLElement('boundedBy')
        e_box = GMLElement('Box')
        e_box.attrib['srsName'] = 'EPSG:4326'
        e_coords = GMLElement('coordinates')
        e_coords.text = '-120,30 -100,50'
        e_box.append(e_coords)
        e_bounds.append(e_box)
        e_root.append(e_bounds)

Next, open the data file and iterate over its rows, producing and attaching new elements, and then returning the XML string:

# continued ...

        # open data file and iterate over rows
        f = file('points.csv', 'r')
        reader = csv.reader(f)
        headers = reader.next()
        self._fields = [(h.split(':')) for h in headers]

        for row in reader:
            if not row:
                break

            e_memb = GMLElement('featureMember')
            e_type = Element('points')

            # member bounds
            e_bounds = GMLElement('boundedBy')
            e_box = GMLElement('Box')
            e_box.attrib['srsName'] = 'EPSG:4326'
            e_coords = GMLElement('coordinates')
            e_coords.text = '%s,%s %s,%s' % (row[3], row[4], row[3], row[4])
            e_box.append(e_coords)
            e_bounds.append(e_box)
            e_type.append(e_bounds)

            # member properties
            e_ptprop = GMLElement('pointProperty')
            e_point = GMLElement('Point')
            e_point.attrib['srsName'] = 'EPSG:4326'
            e_coords = GMLElement('coordinates')
            e_coords.text = '%s,%s' % (row[3], row[4])
            e_point.append(e_coords)
            e_ptprop.append(e_point)
            e_type.append(e_ptprop)

            # other props
            for i in range(3):
                e = Element(self._fields[i][0])
                e.text = row[i]
                e_type.append(e)

            e_memb.append(e_type)
            e_root.append(e_memb)

        f.close()
        return tostring(e_root)

This module could be adapted to a number of different data sources. Instead of reading from a CSV file, one might make a database query, or fetch and parse RSS files.

The result of executing:

import csvsource
csvsource.CSVFeatureCollection().toxml()

is:

<?xml version="1.0" ?>
<ns0:FeatureCollection xmlns="http://localhost/myns"
    ns1:gml="http://www.opengis.net/gml"
    ns1:wfs="http://www.opengis.net/wfs"
    xmlns:wfs="http://www.opengis.net/wfs"
    xmlns:ns1="http://localhost/myns">
  <ns2:boundedBy xmlns:gml="http://www.opengis.net/gml">
    <ns2:Box srsName="EPSG:4326">
      <ns2:coordinates>-120,30 -100,50</ns2:coordinates>
    </ns2:Box>
  </ns2:boundedBy>
  <ns2:featureMember xmlns:gml="http://www.opengis.net/gml">
    <points>
      <ns2:boundedBy>
        <ns2:Box srsName="EPSG:4326">
          <ns2:coordinates>-110.5,44.3 -110.5,44.3</ns2:coordinates>
        </ns2:Box>
      </ns2:boundedBy>
      <ns2:pointProperty>
        <ns2:Point srsName="EPSG:4326">
          <ns2:coordinates>-110.5,44.3</ns2:coordinates>
        </ns2:Point>
        </ns2:pointProperty>
      <FID>1</FID>
      <FNAME>Point A</FNAME>
      <SCORE>0.2</SCORE>
    </points>
  </ns2:featureMember>
  ...
</ns0:FeatureCollection>

This is close, but it is not yet going to work with MapServer. The XML Namespace specification states that applications should use the element namespace URL not the tag prefix, but this detail is handled improperly. MapServer looks for tags with 'wfs' and 'gml' prefixes as it parses the response and does not recognize the prefixes generated by our root element's writer.

The solution, easy to do with Python, is to monkey-patch the ElementTree module. I simply add some namespace and prefix mappings to the module's well-known namespaces immediately after it is imported at the top of csvstore.py:

import csv
import elementtree
from elementtree.ElementTree import Element, QName, tostring

# Monkey Patch adds to the default well known namespaces
elementtree.ElementTree._namespace_map.update({
    'http://www.opengis.net/wfs': 'wfs',
    'http://www.opengis.net/gml': 'gml'})

This gives us what MapServer expects:

<?xml version="1.0" ?>
<wfs:FeatureCollection xmlns="http://localhost/myns"
    ns1:gml="http://www.opengis.net/gml"
    ns1:wfs="http://www.opengis.net/wfs"
    xmlns:wfs="http://www.opengis.net/wfs"
    xmlns:ns1="http://localhost/myns">
  <gml:boundedBy xmlns:gml="http://www.opengis.net/gml">
    <gml:Box srsName="EPSG:4326">
      <gml:coordinates>-120,30 -100,50</gml:coordinates>
    </gml:Box>
  </gml:boundedBy>
  <gml:featureMember xmlns:gml="http://www.opengis.net/gml">
    <points>
      <gml:boundedBy>
        <gml:Box srsName="EPSG:4326">
          <gml:coordinates>-110.5,44.3 -110.5,44.3</gml:coordinates>
        </gml:Box>
      </gml:boundedBy>
      <gml:pointProperty>
        <gml:Point srsName="EPSG:4326">
          <gml:coordinates>-110.5,44.3</gml:coordinates>
        </gml:Point>
      </gml:pointProperty>
      <FID>1</FID>
      <FNAME>Point A</FNAME>
      <SCORE>0.2</SCORE>
    </points>
  </gml:featureMember>
...
</wfs:FeatureCollection>

All that's left is a means to serve this data. One might use a Python CGI script and Apache, or use the code above in a Zope external method that is served to the web via Python Script. Instead, for extra flavor, I'm going to use twisted.web and write a Python resource script to answer GetFeatures requests:

# begin features.rpy

from twisted.web import server
from twisted.web.resource import Resource
from twisted.internet import reactor, defer

import csvstore

class Features(Resource):

    def get_data(self):
        d = defer.Deferred()
        reactor.callLater(0.25, d.callback,
                          csvstore.CSVFeatureCollection().toxml())
        return d

    def render_GET(self, request):
        """a default method called when the object is published"""
        request.setHeader('Content-type', 'text/xml')
        request.write('<?xml version="1.0" ?>')
        d = self.get_data()
        d.addCallback(lambda x:(request.write(x+'\n'), request.finish()))
        d.addErrback(lambda x:(request.write(x), request.finish()))
        return server.NOT_DONE_YET

    render_POST = render_GET

resource = Features()

The entire WFS and data consists of three files: points.csv, csvstore.py, and features.rpy, in a directory such as '/var/www/cheapowfs'. It can be put online using Twisted's mktap and twistd utilities:

mktap web -p 8080 --path /var/www/cheapowfs
twistd -f /var/www/cheapowfs/web.tap

All files: cheapowfs.tar.gz

This WFS can be accessed by MapServer using a layer definition like the one below:

LAYER
   NAME "cheapo_wfs_points"
   TYPE POINT
   STATUS DEFAULT
   CONNECTIONTYPE WFS
   CONNECTION "http://localhost:8080/features.rpy?"
   METADATA
     "wfs_service" "WFS"
     "wfs_typename" "points"
     "wfs_version" "1.0.0"
   END
   LABELITEM "FNAME"
   PROJECTION "+init=epsg:4326" END
END

Hobu and I will be expanding on this example in our Python workshop this summer, and I'll be back to it with a few more interesting examples in the next few weeks.

Comments

Re: Discount WFS Source for MapServer

Author: Myunghwa Hwang

Sean, would you mind letting me know about writing a general python cgi script which can serve as features.rpy? This example is what I looked for, except features.rpy. from Myunghwa Hwang

Re: Discount WFS Source for MapServer

Author: Sean

The quick and dirty CGI would be something like this: ... print 'Content-type: text/html\n\n' print '' print csvstore.CSVFeatureCollection().toxml()