We’ve got a large “legacy” body of code that is used by our staff to
track most of our business, it’s a whole lot of Python CGI that uses
some custom HTML and DB frameworky code; it’s pretty ugly and having
become a convert to the cult of Pylons,
WSGI, and SQLAlchemy, I
really want to replace it.
Of course, anyone knows that one of the Things You Should Never
Do is rewrite
from scratch. Even in the same language.
It would be much easier to integrate the old app into a new Pylons
app, have them running side-by-side, and slowly deprecate the old one
as new interfaces are written. (This is still not a perfect idea, as
demonstrated by the 4 year old TCL code that the current app was meant to replace still
running in production ;-) As bugs in the old code are found, we can
either beat our heads against brick
walls or replace just that
functionality with a sane data model, similar looking templates, and
shiny new controller smarts, and no-one would be the wiser, except of
course that for some reason the developers are no longer constantly
grumpy and the webapp is running smoother and faster than before, and
crashing less often…
It occurred to me yesterday the best way to get a legacy CGI app to run
along with Pylons is to convert it to a WSGI application, and just
mash it in at the bottom of the application stack, where Pylons would
normally go when it 404s.
Here’s the result of some free time and caffeinated excitement this
morning:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
| import imp
import sys
import StringIO
def application(environ, start_response):
# trap the exit handler, we don't want scripts exiting randomly
# we might want to do something with the return code later
retcode = None
def exit_handler(rc):
retcode = rc
sys.exit = exit_handler
# trap the output buffer
outbuf = StringIO.StringIO()
sys.stdout = outbuf
# catch stderr output in the parent's error stream
sys.stderr = environ['wsgi.errors']
# import the script
script = environ['PATH_TRANSLATED']
f = open(script, "rb")
imp.load_module('__main__', f, script, ("py", "rb", imp.PY_SOURCE))
f.close()
# outbuf has a typical CGI response, headers separated by a double
# newline, then content
(header, content) = outbuf.getvalue().split('\n\n', 1)
headers = [tuple(x.split(': ', 1)) for x in header.split('\n')]
# return it wsgi style
start_response('200 OK', headers)
return [content]
|
Our CGI apps print out on stdout
, as you’d expect, so we need to
trap that, here done with a StringIO
monkeypatched on the top of
sys.stdout
. We also need to hack sys.exit
out of the way, so that
the CGIs don’t quit before we’ve completed the WSGI protocol. (I
think this might cause some bugs in the execution though, because now
it’s not terminating execution of the module, but I haven’t found an
example yet to bother worrying about it.)
I import the script, rather than using os.system
, because it
feels right. I use imp.load_module
rather than import
because we
don’t know what the script is until runtime :)
The real trick comes from a tip I found
here
, whilst looking for how to run the imported module as __main__
.
Just imp.load_module
and tell it that it’s __main__
! Simple!
(The hardest part about this whole excercise was now fiddling with
sys.path
and the CWD to make sure the imported script was running
with the right environment that the CGIs used to expect, this is all
done in the CGI runner dispatch.cgi
which I won’t copy here because
it’s pretty trivial and well documented in the WSGI spec.)