wrapping CGI applications in WSGI

wrapping CGI applications in WSGI

We’ve got a large “legacy” body of code that is used by our staff to track most of our business, it’s a whole lot of Python CGI that uses some custom HTML and DB frameworky code; it’s pretty ugly and having become a convert to the cult of Pylons, WSGI, and SQLAlchemy, I really want to replace it.

Of course, anyone knows that one of the Things You Should Never Do is rewrite from scratch. Even in the same language.

It would be much easier to integrate the old app into a new Pylons app, have them running side-by-side, and slowly deprecate the old one as new interfaces are written. (This is still not a perfect idea, as demonstrated by the 4 year old TCL code that the current app was meant to replace still running in production ;-) As bugs in the old code are found, we can either beat our heads against brick walls or replace just that functionality with a sane data model, similar looking templates, and shiny new controller smarts, and no-one would be the wiser, except of course that for some reason the developers are no longer constantly grumpy and the webapp is running smoother and faster than before, and crashing less often…

It occurred to me yesterday the best way to get a legacy CGI app to run along with Pylons is to convert it to a WSGI application, and just mash it in at the bottom of the application stack, where Pylons would normally go when it 404s.

Here’s the result of some free time and caffeinated excitement this morning:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import imp
import sys
import StringIO

def application(environ, start_response):
    # trap the exit handler, we don't want scripts exiting randomly
    # we might want to do something with the return code later
    retcode = None
    def exit_handler(rc):
        retcode = rc

    sys.exit = exit_handler

    # trap the output buffer
    outbuf = StringIO.StringIO()
    sys.stdout = outbuf

    # catch stderr output in the parent's error stream
    sys.stderr = environ['wsgi.errors']

    # import the script
    script = environ['PATH_TRANSLATED']
    f = open(script, "rb")
    imp.load_module('__main__', f, script, ("py", "rb", imp.PY_SOURCE))
    f.close()

    # outbuf has a typical CGI response, headers separated by a double
    # newline, then content
    (header, content) = outbuf.getvalue().split('\n\n', 1)
    headers = [tuple(x.split(': ', 1)) for x in header.split('\n')]
    
    # return it wsgi style
    start_response('200 OK', headers)
    return [content]

Our CGI apps print out on stdout, as you’d expect, so we need to trap that, here done with a StringIO monkeypatched on the top of sys.stdout. We also need to hack sys.exit out of the way, so that the CGIs don’t quit before we’ve completed the WSGI protocol. (I think this might cause some bugs in the execution though, because now it’s not terminating execution of the module, but I haven’t found an example yet to bother worrying about it.)

I import the script, rather than using os.system, because it feels right. I use imp.load_module rather than import because we don’t know what the script is until runtime :)

The real trick comes from a tip I found here , whilst looking for how to run the imported module as __main__. Just imp.load_module and tell it that it’s __main__! Simple!

(The hardest part about this whole excercise was now fiddling with sys.path and the CWD to make sure the imported script was running with the right environment that the CGIs used to expect, this is all done in the CGI runner dispatch.cgi which I won’t copy here because it’s pretty trivial and well documented in the WSGI spec.)