Long Live Chicago Crime

The transition of chicagocrime.org has been noted by geo-bloggers, but they are missing a key part of the story: the resources that had been hosted on chicagocrime.org are not dead at all. They have been moved to a new host. Their original URLs have been retired, but the resources themselves carry on. I'm not speaking figuratively. Stability is a virtue, but Web architecture makes it possible for resources to move when necessary. The chicagocrime.org move is a perfectly ordinary transition of resources on the Web.

How is this accomplished? Requests for http://chicagocrime.org are redirected to http://everyblock.com. I don't mean that you get a web page that says, "click here to go to the new web site". I don't mean that http://chicagocrime.org is maintained as a proxy server for content from http://everyblock.com. I mean that the Chicago Crime host responds to your request with a 301 Moved Permanently response:

sean@lenny:~$ curl -v http://chicagocrime.org/
* About to connect() to chicagocrime.org port 80
*   Trying 64.207.128.91... connected
* Connected to chicagocrime.org (64.207.128.91) port 80
> GET / HTTP/1.1
> Host: chicagocrime.org
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/0.5.35
< Date: Tue, 05 Feb 2008 16:57:32 GMT
< Content-Type: text/html
< Content-Length: 185
< Connection: keep-alive
< Location: http://chicago.everyblock.com/crime/
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/0.5.35</center>
</body>
</html>
* Connection #0 to host chicagocrime.org left intact
* Closing connection #0

That response tells user agents that the resource formerly found at http://chicagocrime.org/ has been moved to http://chicago.everyblock.com/crime/, and that user agents may consider it to be the same resource. Its representation ("format" to you GIS folks) may have changed a little bit, and the less the better for non-human users, but it should be substantially the same. Somewhat smart user agents, like your web browser, will follow up on this response with a request to the new resource location, making the redirect more or less invisible to you. Smarter user agents might update their bookmarks accordingly.

Look a little more deeply and you'll notice that not only has the root resource been permanently moved, the descendent resources have been moved as well:

sean@lenny:~$ curl -v http://www.chicagocrime.org/zipcodes/60615/
* About to connect() to www.chicagocrime.org port 80
*   Trying 64.207.128.91... connected
* Connected to www.chicagocrime.org (64.207.128.91) port 80
> GET /zipcodes/60615/ HTTP/1.1
> Host: www.chicagocrime.org
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx/0.5.35
< Date: Tue, 05 Feb 2008 17:11:30 GMT
< Content-Type: text/html
< Content-Length: 185
< Connection: keep-alive
< Location: http://chicago.everyblock.com/locations/zipcodes/60615/
<html>
<head><title>301 Moved Permanently</title></head>
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/0.5.35</center>
</body>
</html>
* Connection #0 to host www.chicagocrime.org left intact
* Closing connection #0

Browse to http://www.chicagocrime.org/zipcodes/60615/ to see this in action. Starting with a sane URL scheme as Chicago Crime did certainly makes the transition a lot easier.

I can't say for sure that every single resource from chicagocrime.org has been moved in this way, but it wouldn't surprise me if they have, and if it took no more than 20 lines of configuration on the old server. By the way, HTTP specifies a status code that can be used to tell user agents that a resource really is no more: 410 Gone. I wouldn't be surprised if that was employed somewhere on what's left of chicagocrime.org.

Chicago Crime resources are moved, the Web as an application goes on without missing a beat. At least the pieces of it that were only loosely coupled to http://chicagocrime.org; any script that was downloading resources from the old site without following redirection is well and surely (and perhaps rightly) broken. Geographic sites and services can migrate or evolve using the same mechanism, a mechanism that is built into the Web. Does your GIS "web" service client understand redirects, or is it crazy-glued to those service endpoints?

All of the above is elementary web knowledge, but the Chicago Crime move is a nice lesson in applying resource-oriented thinking to a "GeoWeb" site.

Comments

Re: Long Live Chicago Crime

Author: Paul Smith

Adrian accomplished the move by setting up a virtual host for chicagocrime.org on the front-end HTTP server at EveryBlock (nginx, as seen in the HTTP headers in your curl examples). The virtual host stanza is comprised almost entirely of lines like the following:
rewrite ^/zipcodes/(\d+)/?$ http://chicago.everyblock.com/locations/zipcodes/$1/ permanent;
It gets a little verbose for the date-based URLs, mainly because of the slight impedance mismatch between chicagocrime.org date-based URLs and their new locations at chicago.everyblock.com. There's a "catch-all" directive for any resource that doesn't match the explicit rewrites. As you suspected, it was a simple matter of configuration, and also of not forgetting that your HTTP server is part of your application, not just a pass-through.

Re: Long Live Chicago Crime

Author: Sean

Thanks for comment, Paul. Everything I wrote is old hat to web folks, I know, but I thought it was a great opportunity to explain to GIS folks how things can evolve on the web.