spaceblog

wrapping CGI applications in WSGI

We’ve got a large “legacy” body of code that is used by our staff to track most of our business, it’s a whole lot of Python CGI that uses some custom HTML and DB frameworky code; it’s pretty ugly and having become a convert to the cult of Pylons, WSGI, and SQLAlchemy, I really want to replace it.

Of course, anyone knows that one of the Things You Should Never Do is rewrite from scratch. Even in the same language.

It would be much easier to integrate the old app into a new Pylons app, have them running side-by-side, and slowly deprecate the old one as new interfaces are written. (This is still not a perfect idea, as demonstrated by the 4 year old TCL code that the current app was meant to replace still running in production ;-) As bugs in the old code are found, we can either beat our heads against brick walls or replace just that functionality with a sane data model, similar looking templates, and shiny new controller smarts, and no-one would be the wiser, except of course that for some reason the developers are no longer constantly grumpy and the webapp is running smoother and faster than before, and crashing less often…

It occurred to me yesterday the best way to get a legacy CGI app to run along with Pylons is to convert it to a WSGI application, and just mash it in at the bottom of the application stack, where Pylons would normally go when it 404s.

Here’s the result of some free time and caffeinated excitement this morning:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import imp
import sys
import StringIO

def application(environ, start_response):
    # trap the exit handler, we don't want scripts exiting randomly
    # we might want to do something with the return code later
    retcode = None
    def exit_handler(rc):
        retcode = rc

    sys.exit = exit_handler

    # trap the output buffer
    outbuf = StringIO.StringIO()
    sys.stdout = outbuf

    # catch stderr output in the parent's error stream
    sys.stderr = environ['wsgi.errors']

    # import the script
    script = environ['PATH_TRANSLATED']
    f = open(script, "rb")
    imp.load_module('__main__', f, script, ("py", "rb", imp.PY_SOURCE))
    f.close()

    # outbuf has a typical CGI response, headers separated by a double
    # newline, then content
    (header, content) = outbuf.getvalue().split('\n\n', 1)
    headers = [tuple(x.split(': ', 1)) for x in header.split('\n')]
    
    # return it wsgi style
    start_response('200 OK', headers)
    return [content]

Our CGI apps print out on stdout, as you’d expect, so we need to trap that, here done with a StringIO monkeypatched on the top of sys.stdout. We also need to hack sys.exit out of the way, so that the CGIs don’t quit before we’ve completed the WSGI protocol. (I think this might cause some bugs in the execution though, because now it’s not terminating execution of the module, but I haven’t found an example yet to bother worrying about it.)

I import the script, rather than using os.system, because it feels right. I use imp.load_module rather than import because we don’t know what the script is until runtime :)

The real trick comes from a tip I found here , whilst looking for how to run the imported module as __main__. Just imp.load_module and tell it that it’s __main__! Simple!

(The hardest part about this whole excercise was now fiddling with sys.path and the CWD to make sure the imported script was running with the right environment that the CGIs used to expect, this is all done in the CGI runner dispatch.cgi which I won’t copy here because it’s pretty trivial and well documented in the WSGI spec.)

more than a feeling

I woke up this morning, and the sun was gone Turned on some music to start my day..

For a while, I’ve wanted to be woken up by anything other than my clock radio, so last night I peeked at banshee to see if it had a remote control… turns out it does!

Hacked up this script, shins:

1
2
3
4
5
6
7
#!/bin/sh

DISPLAY=:0
XAUTHORITY=/home/jaq/.Xauthority
export DISPLAY XAUTHORITY
banshee --enqueue /media/usbdisk/music0/Albums/The\ Shins/Chutes\ Too\ Narrow/01\ -\ Kissing\ the\ Lipless.ogg
banshee --play

and set it to run at 9am:

dawn% at 9am
warning: commands will be executed using /bin/sh
at> sh shins
at> <EOT>
job 6 at Sun Sep 10 09:00:00 2006

and this morning I was woken to the soft sounds of The Shins, just as planned. Great start to the day!

pylons gotchas

Benno was over, hacking on the LCA 2007 website with me yesterday, and we hit two gotchas, both I knew about but when I explained them to him they sounded silly.

c considered harmful.

c is a request-local global object that you can attach objects to, which is useful as a way of passing data from the controller to the template code – when you’re calling a parameterised template you might not know at call time what the args the template wants are, but you can pass them all in on c. If you’re using some pattern like a mixin CRUD class for generalising common data operations, then the code that actually calls the template doesn’t know what the object is, but the template it’s calling does.

c has the magical property that it has overloaded __getattr__ to return an empty string if the attribute is not found. This is a mixed blessing; your templates can access an attribute that hasn’t been attached and it’ll mostly cope with it. (Problems happen when you try to access attributes of nonexistent attributes, and you get the confusing message ‘str has no attribute X’.)

However, this means you hide bugs; you’ve forgotten to attach the object you want to c and then your code runs fine; it’s the users who find the problem after deployment, not during development. Having a __getattr__ that throws exceptions means you find out about these problems a lot sooner.

I think both of these points show that c in general is a bad idea; you should make use of explicit args so that your template interface is clearly defined – I haven’t yet found a nice way of doing it that is as easy as or better than using c though.

Myghty expressions that evaluate to False return empty strings.

We had a simple construct like so:

1
    Count: <% len(c.review_collection) %>

which has the interesting property of evaluating to ’’ when c.review_collection is empty; len() returns 0 which is False.

This is pretty retarded; I suspect there’s a shortcut along the lines of:

1
2
3
    content = evaluate_fragment("len(c.review_collection)")
    if content:
        write(content)

when these inline blocks are rendered; the if block clearly will fail to trigger when the inline block evaluates to 0, False, [], or {}. I can’t think of a case where this is a good thing.

The workaround is to wrap the len() call in str(), so that the fragment doesn’t evaluate to false.

1
    <% str(len(c.review_collection)) %>

More gotchas as they come to hand.

LCA 2007 proposals so far

I’m having a browse of the submissions to LCA that we’ve got so far, and there’s some cool stuff in there!

There’s still a little over 2 weeks for you to get your proposals in, so don’t hold back! I’m sure you all have something very exciting you want to talk about at the conference!

The more submissions we get, the rockin’er the conference will be!

Dad, I dug another hole...

I wrote another mock object, this time replacing urlopen from urllib2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import urllib2
import StringIO
import unittest

class Dummy_urllib2(object):

    def install(cls):
        urllib2.urlopen = Dummy_urllib2.urlopen

    install = classmethod(install)

    def urlopen(self, url, data=None):
        self.url = url
        self.data = data

        response = StringIO.StringIO("foo")

        def geturl():
            return url
        
        response.geturl = geturl

        def info():
            return {}
        
        response.info = info

        return response

    urlopen = classmethod(urlopen)


class TestDummy_urllib2(unittest.TestCase):
    def test_install(self):
        Dummy_urllib2.install()

        url = 'http://notfound.example.org'

        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError, e:
            self.fail("URLError raised, Dummy_urllib2 not installed or failed: %s" % e)

        self.assertEqual(url, Dummy_urllib2.url)
        self.assertEqual(url, r.geturl())
        self.assertEqual(None, Dummy_urllib2.data)
        self.assertEqual("foo", r.read())

if __name__ == '__main__':
    unittest.main()

This time it comes with it’s own test suite. How meta!