Open access to National GIS data

A corollary to Jeff Thurston's grammatically challenged geospatial thought for the day:

Let’s be clear: If government pays for geodata, then makes it available for free. Then it is not free. You ARE paying for it.

is this:

If you're paying for it, you own it, and should have the right to unfettered access to unclassified portions of it.

The National Institute of Health mandates open access to the published results of science it funds. Similar open access to all publicly funded research is currently the 12th ranked suggestion to Obama's future CTO. An equivalent policy for National GIS data is in my opinion, a must. I don't mean access to a service endpoint, I mean access to shapefile downloads.

I believe I will write my new Senator, Mark Udall (do I ever love typing that phrase!), and see if he's interested in doing something about it.

Update (2009-01-16): related, more thoughtful post here.

Update (2009-01-28): more from Sean Gorman and Paul Ramsey.

Comments

Re: Open access to National GIS data

Author: Kirk

I don't think I'd like for the public to have access to precise locations of archaeological sites, would you?

Re: Open access to National GIS data

Author: Eric Wolf

I guess people don't read before they make suggestions. The Obama platform specifically cited increased access to Government information as an important goal of his administration. And open access is generally the norm at the USGS. I believe it is by law that USGS-collected data cannot be copyrighted and is free (libre) for any use. Unfortunately, there are so many snafus related to the free (gratis) problem that the bureaucrats get stuck in a tailspin. The past eight years, the Department of Interior has operated under a mantra of "we must become like a commercial operation" because, as we all know, the market is always right... right? We also have the technical issue of USGS data is not in shapefile format because of the magnitude of the data and the diversity of the data types. Most of the data is stored as custom geodatabases, sometimes centralized but frequently distributed. Providing service endpoints is easier than shapefiles, especially for the centralized geodatabases - all we have to do is front-end the database with the appropriate protocol. The Seamless server (http://seamless.usgs.gov/website/seamless/viewer.htm) already provides shapefile downloads. But because of the way the data is stored at the USGS, it must first be extracted from the databases and then turned into a shapefile. The debate, really, is: would you rather the USGS spend your tax dollars maintaining a database structure (i.e., independent shapefiles like "transportation for Colorado") that doesn't fit the Survey's own internal needs for its mission of furthering environment science? In the past, the USGS charged for data delivery to help compensate for this difference between internal and external data format needs. Of course, if you take this to the next step, you get the FGDC and SDTS. I won't embarrass myself by going there... I'd suggest, in addition to righting Udall, also CC: that other famous Colorado politician, Ken Salazar, the incoming Secretary of the Interior. Salazar's role is in interpreting administrative guidelines into policy for the USGS and the rest of the DOI.

Re: Open access to National GIS data

Author: Sean

Kirk, as far as I'm concerned that's another kind of classified. Moot here, because archaeology and cultural heritage isn't part of the National GIS proposal, but the same issue does come up in regard to wildlife habitat. There are people who might bring on the bulldozers upon discovering that their property intersects with endangered species habitat.

There is a mind-blowing cave near my old hometown, Logan, Utah. As a kid I went in there a bunch of times. Increasing numbers of visitors, some who camped inside, built fires, etc, made life hard for Townsend's big-eared bats. The Forest Service tried some seasonal closures to protect the bat population, and some hillbillies (who are probably related to me -- this is Utah, after all) responded by trying to eliminate the bats. The cave is now gated, and closed. Sadly, I don't think this kind of vandalism is particular to the Intermountain West.

The paranoid may say parcel data likewise needs to be kept out of the hands of evil-doers, but I think this is bogus.

Thanks for the Salazar reminder, Eric. I busted my ass for him in 2004, and he owes me a favor ;)

Re: Open access to National GIS data

Author: Dave Smith

There are quite a few different reasons why access might be controlled - not just sensitivity due to national security, archaeological or natural security, but also others - e.g. governmental regulation on business may make government privy to information about a company's business processes, suppliers, and so on - which might otherwise by confidential trade secrets. However, I tend to think that the datasets which genuinely require sensitivity are the exception to the rule. The vast majority should be open and accessible. However, another consideration is that many governmental entities also face unfunded mandates which dictate that they collect and manage data. How to pay for it? Charge users, is one model, unfortunately. Or... don't collect the data at all. Or... rob Peter to pay Paul, and borrow a little funding from another program and get the most basic data collected, which in turn, might not be in an easily-sharable form. Many obstacles. Should USGS be maintaining data outside of their own mandate? Probably not. But meanwhile, can they access said data from DOT/FHWA or other sources in a seamless fashion? Heck no. So everywhere across government, we have all these disconnected little stovepipes, which without the rest of the background data, would generally be of limited utility. FGDC, "GIS for the Nation", GOS, OMB Geospatial Line of Business and all of these should be pursuing a national FRAMEWORK for providing this - they have accomplished a few things here and there, but the technical architecture is still sorely lacking. And without sound guidance, governance, and a solid national architecture and framework, the Dangermond proposal could seriously threaten to only propagate the same type of thing. Who manages and houses what? How is the data to be published, discovered and accessed? Technology is not the hurdle. The hurdle is cultural.

Re: Open access to National GIS data

Author: Sean

Eric, did you bring up the Seamless app as an example of how data should be shared? It is so wrong, in so many ways. I'm not counting on the usual suspects to deliver anything better for a national GIS, and that's why I'm saying the USGS should just release the data periodically and let others remix it into useful services.

Re: Open access to National GIS data

Author: Kirk

Sean, I hadn't really thought about wildlife data being classified, but see what you mean. I live not far from Bracken Cave, which seems fairly well protected by BCI ... http://tinyurl.com/brackencave. Notice how "find bat locations" takes you to a page that tells you everything about the cave and its bats - except for the location. I was thinking more about the antiquities databases you build tools for. Does ISAW try to discourage treasure hunters from gaining access?

Re: Open access to National GIS data

Author: Tyler Erickson

There was a fairly good keynote talk related to this subject last December at the AGU Fall Meeting, an academic scientific conference that draws 15,000+ attendees. Michael Jones of Google spoke on spreading scientific knowledge, and one of his main points was that all government funded research should require open publishing of the work (data, source code, and results) so that others can easily reproduce and build upon it. The talk seemed to be well received, given that most of the audience members are dependent on government funding for the majority of their research. At least I hope that they see the big picture and agree with it: if everyone gives away their one precious dataset/algorithm, everyone will have access to thousands of new datasets and algorithms to for used in their own research.

Re: Open access to National GIS data

Author: Sean

Kirk, I do think that location obscurity is the very least that digital antiquities people should provide for sensitive archaeological sites that can't be better secured. ISAW projects are different: we aggregate and provide tools for study of already known places, people, and texts. Databases of hidden treasures aren't part of our mission. Ideally, our workflow engine and editorial board publish no material before its time, but that's principally to maintain a high standard of scholarship.

Re: Open access to National GIS data

Author: Dave Smith

Carrying over discussion from Sean Gorman's site - The issue with just providing data (e.g. shapefiles) is that they require download/conversion/etc... a process. In this process, how often do you update? Do you have/provide adequate metadata to know whether or not it's most-current data? Do then you need to build a refresh process, to schedule a mechanism to perform the download and update on your end? Are there going to be dozens of other stakeholders all making redundant investments in the same type of refresh processes? With Apps for Democracy et al, it was beyond just "data" but specifically directly-mashable data feeds - and this can be a means of providing and ensuring currentness, via KML network links, live GeoRSS feeds et al. Part of my concern is in economies of scale (why not build it once, use it many times) and in potential liabilities, e.g. folks who might not be dilligent in routinely updating the datasets that feed their apps. Easiest solution would be to just publish a live feed. Have agencies provide direct data access via KML network link, GeoRSS, WxS services, tile services, e.g. GeoServer. With a modicum of infrastructure planning, this could be quite scalable and robust, and serve a vast majority of need across the entire community. And, the data would reside in-place with each steward, in a federated NSDI. This is basic stuff, not complicated star-wars physics. The flipside of the equation is in data collection efforts - e.g. EPA's Exchange Network, which collects data from all 50 states, tribes and other participants. Or... you have OAM, great idea for crowdsourced data, but what happened here?- again, infrastructure crunch, needing sponsorship and funding. "Just do it" is all fine and good, but definitely has its practical limits, particularly when dealing with an entire national dataset and applications which require cross-agency and inter-agency data. With respect to obscuring data, touch base with NatureServe - they are working on ways to allow site screening for sensitive/endangered species without exposing the actual location.

Re: Open access to National GIS data

Author: Sean

I invited trouble by reducing my desire for excellent, standardized, syndicated data to "shapefiles". I am in favor of funding agencies to create, manage, publish (using simple and robust mechanisms like RSS), and curate this data. My only objection is to the proposed shiny service architectures and portals; the GIS industry/community rarely gets that stuff right.

Re: Open access to National GIS data

Author: Eric Wolf

Sean: I brought up Seamless not as an example of how data should be served, but an example of how the USGS is actively trying to come up with a scheme for providing the diverse range of data it collects, creates and maintains to a diverse user base. Essentially, the problem is similar to the Census DIME and TIGER files. The Census gives you dumps from the database and a schema to help you decode the data dump. The problem is the USGS doesn't have one database. We have many. And the larger databases are comparable in size and complexity to the Census data. And unlike the Census which is really only updated once every decade, many of the USGS databases are updated in real-time. I'm not trying make excuses. I'm trying to help you understand the challenges. My colleagues and I in CEGIS at the USGS are actively trying to understand how to best manage data dissemination. So we appreciate being told what is wrong with what we are doing and what people actually want.

Re: Open access to National GIS data

Author: Dave Smith

Eric raises another point - EPA has similar flows, e.g. FRS, where the data it contains comes from a large number of disparate stewards, and which, based on varying practices and standards in place with external stewards, may have a host of issues when it arrives, e.g. mismatched datum, reversed lat/long, signs on longitude values, and so on- further, representation of the "place" may mean very different discrete things - e.g. water outfall, air stack, front gate outside of a plant, and so on, along with other issues which need to be harmonized in order to provide a seamless national dataset of regulated facilities. And as with the USGS database, these are refreshed on a continual basis. As such, there are hurdles to be overcome before even turning over the data, and that's been half the battle. However, once the data can be gotten to this point, the solutions for delivery become a lot more straightforward, at least in today's terms. It should also be considered that, for example, EPA's web-based GIS applications began life in the 1990s, when current technologies and architectures were not yet conceived, with many pieces scratch-built. Many functionalities can and are being replaced for more current technologies - however again, availability of resources has been an issue. Dealing with complex processes, legacy systems and disparate resources across and outside an enterprise is never as easy as building something new. But hopefully existing efforts and technologies, such as GeoServer can be employed to provide robust, low-cost infrastructure to serve these types of needs in the future.