Anchor's RAID Migration howto

Fri, 21 May 2004 12:53

(reposted from http://lists.slug.org.au/archives/slug/2004/05/msg00374.html)

So here’s Anchor’s RAID migration document, tweaked for Debian unstable (previously it’d only been done on various flavours of Red Hat), based on the migration I did last night (and I expect I’ll be doing it again today on a different machine).

I’m going to skip the parts that migrate an existing RAID configuration to bigger disks or an alternate RAID scheme, and instead assume you’ve got two disks in a box and you performed the install onto the first disk (hda), thus leaving the second disk (hdb) unused.

Partition hdb how you like it. My server partitioning scheme is like this for IDE machines:

hda1 500M /
hda2 2G swap
hda3 Extended
hda5 2G /var
hda6 4G /usr
hda7 remaining /data

/data contains bind mount points for /home, /var/lib/{mysql,postgres}

Create a degraded RAID array.

mdadm -C -l 1 -n 2 /dev/md0 /dev/hdb1 missing
mdadm -C -l 1 -n 2 /dev/md1 /dev/hdb5 missing
mdadm -C -l 1 -n 2 /dev/md2 /dev/hdb6 missing
mdadm -C -l 1 -n 2 /dev/md3 /dev/hdb7 missing

-l specifies the raid level, in this case RAID-1, and -n specifies the number of devices that will be in the array, in our case 2. missing lets mdadm know that there is another device we haven’t specified yet, and the array will be built on one half.

If you’re doing this on RAID-5, then you’d use two disks and a missing.

Create the new filesystems:

mke2fs -j /dev/md0
...

Mount the new partitions.

mkdir /newroot
mount /dev/md0 /newroot
cd /newroot
mkdir usr var data
mount ...
mkdir data/var.lib.postgres
mkdir -p var/lib/postgres
mount -o bind data/var.lib.postgres var/lib/postgres

Copy the data from the existing root to the new root

cd /
for each mountpoint on the old root:
cp -ax $mountpoint/ /newroot/$mountpoint/

Now comes the fun part. Shutdown all services that are writing to the disk, using ps ax and netstat -lnp to find out who’s still alive.
Rsync the data that just got written

for each mountpoint:

rsync -avnx $mountpoint/ /newroot/$mountpoint/

double check that did what you thought, then remove the -n option

Pivot the kernel onto the new root:

mkdir /newroot/oldroot
cd /newroot
pivot_root . oldroot

If you’re on the console, you can exec chroot . /bin/sh <dev/console >dev/console 2>&1

otherwise if you’re playing tough-guy-migration via SSH, don’t do this, yet

mount -t proc proc /proc
mount -t devpts devpts /dev/pts
mount -t tmpfs tmpfs /tmp

and for 2.6 kernels

mount -t sysfs sysfs /sys

Restart init and SSH:

telinit u
/etc/init.d/ssh restart

Now if you’re in tough guy mode, ssh into the new machine.

fuser -vm /oldroot should show you a few kernel threads and your first ssh session. If the ssh restart was successful and you’re logged in, log out of the first ssh session.

Umount the old root:

see the processes holding up the umount

fuser -mv /oldroot

see the mounts holding up the umount cat /proc/mounts

umount the virtual filesystems from oldroot:

umount /oldroot/proc
umount /oldroot/dev/pts

and so on

Chances are /proc/mounts says you’ve got a /dev/root.old and a /dev2/root2, and you’ve got some kernel threads attached to /oldroot/initrd:

mount -o remount,ro /dev/root.old /oldroot/initrd
mount -o remount,ro /dev2/root2 /oldroot

umount -l /oldroot/initrd
umount /oldroot

The -l to umount is a recent feature that does a lazy umount… it removes the mount point from the mounted filesystems namespaces, so it’s effectively gone, but it’ll get properly umounted when all processes using it are finished. It’s a good idea to make it read-only first just so you don’t break anything.

Fix /etc/fstab and /etc/mtab

/etc/fstab has the old hda filesystems on it, so fix that up.

/etc/mtab has the old devices listed because the pivot_root doesn’t update it, so fix that up too. Cross check against /proc/mounts.

Update the boot loader.

Debian unstable uses grub, so somethign like this will install the first stage bootloader into both MBRs:

grub
grub> device (hd0) /dev/hda
grub> device (hd1) /dev/hdb
grub> root (hd0,0)
grub> setup (hd0)
grub> root (hd1,0)
grub> setup (hd1)

That tells grub that it’s hd0 is Linux’s hda, to use hda1 as the location for grubs files (/boot/grub, as /boot is on / in my case) and to install the MBR on /dev/hda. The second pass is to do the same thing on hdb, using /dev/hdb1 and hdb’s MBR.

double check it worked by looking for the string GRUB in the output of

dd if=/dev/hdX count=1 | strings

Fix the initrd

I always forget this part and end up booting off of half of the / array which has the effect of destroying the raid superblock, requiring /dev/md0 to be reconstructed afterwards. So the moral is DON’T FORGET THIS PART.

mkinitrd -k -o /boot/initrd.img.tmp

Look in the temporary directory that mkinitrd left its files in, /tmp/mkinitrd.*/initrd and make sure that the file ‘script’ contains a line that builds the /dev/md0 array, like this:

mdadm -A /devfs/md/0 -R -u ...

It’ll probably be building it using only /dev/hdb1 at this moment, that’s fine.

Fix mdadm.conf.

/etc/init.d/mdadm and /etc/init.d/mdadm-raid will automagically build the remaining arrays at boot time if you get this right, otherwise the fsck will bomb out because /dev/md1 and friends are corrupted (i.e. don’t really exist)

So, in /etc/mdadm/mdadm.conf:

DEVICE /dev/hda* /dev/hdb*
ARRAY /dev/md0 devices=/dev/hda1,/dev/hdb1
ARRAY /dev/md1 devices=/dev/hda5,/dev/hdb5
ARRAY /dev/md2 devices=/dev/hda6,/dev/hdb6
ARRAY /dev/md3 devices=/dev/hda7,/dev/hdb7

Make sure you remember the DEVICE line, otherwise it’ll still fail…

Reconstruct the RAID array from the now free hda

sfdisk -d /dev/hdb | sfdisk /dev/hda

That’ll copy the partition table from hdb, your good disk, to hda, the missing disk.

Hot add the partitions to the array:

mdadm -a /dev/md0 /dev/hda1
mdadm -a /dev/md1 /dev/hda5
mdadm -a /dev/md2 /dev/hda6
mdadm -a /dev/md3 /dev/hda7

ramp up the reconstruction speed:

echo 1000000000 > /proc/sys/dev/raid/speed_limit_max

watch the progress:

watch "cat /proc/mdstat"

Do the boot loader again, because it’s fun, and likely stuff has moved around on hda1.

Don’t do anything on hda until the raid reconstruction is finished.

At this point you can continue using the machine, in fact as early as the “umount oldroot” step you can restart all your services and the machine will be back online: that’s a downtime of only as long as it takes to do the final rsync before pivoting.

I’d recommend rebooting soon after though so yuo can make sure you got the bootloader and initrd part right. During booting you can get away with changing the kernel root= option to use /dev/hda1 if the raid array isn’t getting constructed in your initrd.img.tmp, and don’t delete or overwrite any of your initrds whilst you’re debugging, only once it boots without assistance should you overwrite the initrd.img that’s listed in the grub menu.lst.