spaceblog

ERROR: invalid page header in block 14719010 of relation "acct_old"

I have a broken table.

I’ve been trying to recover data from this table for a long time.

I know that block 14719010 has an invalid page header. I know that this corresponds to row 614113462 in that table. This post was helpful enough to show me a way to identify the row, but alas the steps for recovery suggested do not work, because as soon as you try to select the ctid of the row, you are informed of an invalid page header in that block.

I discovered pgfsck, but alas it doesn’t speak PostgreSQL 7.4 table format.

I want to copy the valid data out of this table into a new, working table. I know which row is broken, so I could copy out all the rows up to that limit and insert them into the other table, then try to delete the rows up to that point. Alas, two problems:

  1. The transaction that occurs ends up filling the disk rapidly due to the number of rows involved. This also has the sideeffect of stealing all the disk space and not letting me reclaim it because VACUUM and VACUUM FULL fail with our friendly error message.

    So I can do it in small chunks, and delete those bits once done, but…

  2. DELETE can’t be followed with a LIMIT clause, so I can’t delete the rows once they’ve been safely copied over. I can’t specify anything else because a WHERE clause causes the server to search the entire table for matches, which will mean it’ll hit our friendly block and abort with our very familiar error.

I’d like to try to dump the table, and hopefully when it crashes, at least the first 614113461 rows will be dumped. However, my attempts so far reveal that pg_dump is buffering the stream, so I suspect that I won’t see any results in the dump at all.

I can’t even identify the file on disk that is broken, and try to remove or zero or something it. I’ve heard rumour of a pg_filedump command but it’s nowhere to be found in the Debian 3.1 package of PostgreSQL.

… time passes…

My hopes were raised when I found I didn’t have postgresql-contrib installed, but alas no pg_filedump. packages.d.o doesn’t even know about it :(

Bit more googling for postgresql delete limit finds me at this thread. Aha! maybe a way to shrink the old table as I clean it up!

The battle rages on…

Why SCons rules, make drools, and autogoats eats your soul

I posted on the icculus quake3 list about the possibility of using SCons; the build that id used was cons, though they also left behind a Makefile that was clearly out of date. The discussion didn’t last long: someone suggested using automake and the lead developers quickly doused that idea by stating that plain make would be the way to go.

It got me thinking, why did I think SCons was better than that, when there was strong demand to stay put, and suggestions to change to something I can’t stand? Why did I want to use SCons in the first place?

I think it all boils down to one particular use case, that I find so attractive that all other failings are massively outweighed: modifications to the build just work.

Let’s assume we have a project with three different build systems: plain make, GNU automake, and SCons. All three from a fresh checkout do the right thing and you end up with the binary you’re expecting.

Also, the build systems do the right thing when you modify the source code: make will detect the timestamp change and recompile; automake just generated a Makefile with the correct dependencies, so that make will detect the timestamp change and recompile. SCons will notice either a timestamp change or a MD5 sum change on the source file, and recompile.

There’s a caveat here: with automake and SCons you can guarantee that your build will be correct, with plain make you need to be sure that you’ve specified the dependencies correctly. This is a corner case though, it takes a special kind of project to mess this up, but it is possible and tracking it down and fixing it can lead to pain: at work, other developers can and have set up the build system to do something the wrong way, and then complain to me “I wanted to do this and it doesn’t work!”

My use case, however, is not modifying the source code, it’s modifying the build.

Say you’ve been hacking for a while, and your project is working, but you realise that you’ve foolishly left out -Wall -Werror from the CFLAGS. Let’s see what happens if you add it to each of the above projects and recompile:

  1. Plain make will do nothing. The targets already exist and the source code has not changed. Make knows nothing of recipe signatures, so the fact that the command line to do the build has changed is of no consequence. In order to get a correct build you will need to make clean, and that relies on the fact that you’ve written a target to do that correctly – more potential for getting it wrong.

    Let’s not limit ourselves to CFLAGS here; if there’s an option that only changes part of the build, then you are going to have to rebuild the whole project in order to take advantage of the change. That totally sucks on a project that takes between 30s and 5 minutes to build, because it’s too long to wait and not long enough to go get a coffee.

  2. Automake will either do nothing, or do something wrong. Depending on whether you ran configure with --enable-maintainer-mode and you have AM_MAINTAINER_MODE specified in configure.in, the generated Makefile may have no idea about its own dependencies. On the other hand, my experience with automakes 1.4, 1.5, 1.6, 1.7, and 1.8 have always caused some frustration with random regeneration errors – partially reconstructing the build system, usually caused by make’s own reliance on timestamps – and even when it gets it right you still have the problem of needing to do a full clean before a rebuild.

  3. SCons will rebuild your project correctly. SCons keeps a record of what the command line for each target was, and if that changes then it will consider the target to be out of date. If you change the CFLAGS for a project between builds, the project will be rebuilt correctly. If you change something that only affects part of the build, only those parts that are affected will be rebuilt.

There are three kinds of people who will be affected by your build system: users, distributors, and developers. The users, though automake has a lot of mindshare (with it’s configure; make; make install) mantra, don’t really care how it builds as long as the README explains how and that it works. They only need to build it once. The distributors don’t care what builds it as long as it builds and installs into the right places, i.e. FHS compliance out of the box – --prefix and DESTDIR variables or something equivalent so that the packages can be built. They may build it several times, but it’s all encapsulated in the packaging scripts and hidden behind a simple command tool: debian/rules or rpmbuild -ba or what have you.

Developers, who are building and hacking and building and tweaking and profiling and building, do care that their build is correct, and that the time spent interacting with the build system is as short as possible. As a developer, I don’t want to spend time cleaning, regenerating, and building just to make sure I get a correct build. I just want to type one command.

scons is that command.

pronounciation? try typing it!

Mary writes:

Now, people, you can’t seriously be expecting me to pronounce the latter as bazaar can you? C’mon, … try saying it. bzr. bzr. BZR.

I’m happy to call it ‘bazaar’, but really, try typing those three characters several hundred times a day. WORST CHARACTER SEQUENCE EVER.

Really, it should have been called teh. This is nicely spread over the home row and uses two hands. It also has the advantage that your command for retrieving a copy of the repository will be teh suck. Other useful commands include teh diffz, teh m3rj, and so on. The gentoo crowd will love it.

nagios ssh check is fucked

The nagios ssh check only checks for the banner response.

If anything after that is going to prevent you from logging in, nagios won’t tell you. It can’t do key negotiation, and it certainly won’t go as far as getting a shell. If there’s a delay on the server causing the authentication to take longer than a TCP timeout, then check_ssh will pass and you won’t know until you have angry users.

configuration is uninteresting

After a long wait and a long story, my minipci wireless card arrived, so I chucked it into the laptop and booted up. It appeared in the lspci output, which was awesome.

On Debian, though, because Free is more important that Working, you’ve got to copile your own copy of the madwifi driver. This bores me to tears, and I think I’m so offended by the prospect of having to a) compile a kernel module, and b) have to do it every time the kernel package is upgraded, that I’m going to forego my dreams of doing native Debian packaging work and install Ubuntu.

I didn’t do this in the first place because for a fleeting moment I thought it would be fun to get back into random system hacking, like what I used to do in uni; so I spent a few hours reading HOWTOs after I installed the laptop, to find that getting suspend-to-RAM working was so trivial that I was surprised to find it didn’t already just work.

But it turns out that I’m more interested in getting my work done, than having to fiddle with things that should already just work – these problems have already been solved by other people – so I’m going to switch.

My name is Ellen Feiss, and I’m a student.