spaceblog

getting the shits with openldap and writing my own multi-master ldap server

We’re migrating all of our authentication system to LDAP; got a master OpenLDAP machine hidden away, with each of the public access servers (i.e. webservers and mailservers) using syncrepl to keep their own copies up to date. The failover design is pretty good, but comes at a cost: we need to use OpenLDAP 2.2 and for that we need to manually build the RPMs for RHEL3, RHEL4, RHEL2.1 and RHL7.3.

Needless to say, this sucks.

So it’s 5pm Friday night, last night, and I’m fighting my way through a backport of our backported RHEL3 SRPM to RHL7.3, and the versioned dependencies are killing me. Worst of all are the build system dependencies, GAR! Backporting libtool just to build the fucking thing? Uncool.

I bitched to Benno, and we got to thinking, Tridge has written an LDAP-like library, it’s cool, it’s fast, but it’s not a server, but then what would you need on top of that? Samba 4? :-) Well, it’d be nice to have a software package designed to be buildable on legacy operating system versions in order to get my job done. Python’d be good for prototyping (but in order to get mindshare in the ricer demographic we’d need to write the final version in C, for maximal -funroll-loops impact :-)

You’d also want compatibilty with OpenLDAP: communication over ldapi://, ldap://, and ldaps:// protocols, and an ABI compatible client library so you could just drop in on top. Strict schema checking only, with a collection of excellent (and non-conflicting!) schemas one can use.

SSL mode would absolutely need to do bidirectional certificate checking. No exceptions.

I’d like such a server to carry the same pride that small, safe, secure deamons dovecot and vsftpd have; using secure programming techniques, simple configuration, and sticking to the desired featureset.

Plenty of tests in the test suite :-) Write it using a single threaded statemachine model; no threads whatsoever.

Finally, what’s the point of a new product if it doesn’t provide advantages over the existing competitors? Out of the box multi-master capabilities would be gold. I chatted to Benno about this at the time, he even suggested multi-master, but conceded it’s probably going to be the hardest thing to do.

So, the night turned into a prototyping-fest where Benno and I attacked the problem on paper and in emacs, and eventually coming up with a simple algorithm that looked to us (though we haven’t been able to prove it :-) that it would work for multi-master cliques of any N.

For tested N of 1 up to 25, we found message passes until synchronisation increased almost N^2 with the size of the clique. This sucked as we injected update messages during the update – but our thought experiments suggest that

  1. people are unlikely to have more than a handful of servers in a single cluster
  2. real life updates are going to be coming less frequently than they were in our test.

The test code lives in arch at jaq@spacepants.org--2004/almanac--prototype--0 (thanks to thesaurus.com for synonyms of “directory”). You can check out Node.py for the master update algorithm, a simple three-conditional check and a global time counter.

Benno thinks we need a vector clock to reduce the number of messages passed, I think he’s right. But currently the algorithm works good enough, I think it’s enough to start building the rest of the server on top of it :-)

why use pam_ldap in the session service?

One of our RHEL3 servers started segfaulting when using su; it turns out it was because of the line

session  optional   /lib/security/$ISA/pam_ldap.so

in /etc/pam.d/system-auth, which Red Hat’s authconfig places there by default.

So I tried to google for what session is for, and why you’d want to put pam_ldap in there, and came up with very little: this page from 2001 gave some hints but didn’t actually tell me what pam_ldap does for the session. Everywhere else on the internet just says “Oh add this line to your pam config.” No-one seems to know why.

So, for the meantime, it’s commented out, and things look like they’re working.

But I’m still curious; why does every PAM+LDAP guide say to do this?

RSS content module

The content module defines whole big piles of goo for RSS 1.0 feeds, which aren’t recognised at all in liferea; and yet the totally nonexistent (per the aforementioned link) content:encoded tag does work, and as yet the only corroborating evidence of such a tag is the guts of the feed of your local planet or somesuch.

Weird… well I’ll just go with the flow on this one.

setting the content-type of myghty output

Myghty’s documentation is pretty good, ‘cept for the part that tells you how to set the Content-Type header. Well, it tells you where in your code to set headers, just not how you go about doing that…

The r object, the request object, is the place that spits out the HTTP headers, including the content type, and it defaults to text/html; you can see this in http/HTTPHandler.py and http/CGIHandler.py.

So, in your <%python scope="init"> section, just set

r.content_type = "text/xml"

or whatever you like.

Come to think of it, though, I don’t think there’s a way (at least, based on my brief glance at the code to find out about content_type) to set arbitrary HTTP headers. (If the maintainer of Myghty stumbles across this post, please make the request object a dictionary-like object, a-la Python’s email.Message, thanks :-)