(reposted from http://lists.slug.org.au/archives/slug/2004/05/msg00374.html)
So here’s Anchor’s RAID migration document, tweaked for Debian unstable
(previously it’d only been done on various flavours of Red Hat), based
on the migration I did last night (and I expect I’ll be doing it again
today on a different machine).
I’m going to skip the parts that migrate an existing RAID configuration
to bigger disks or an alternate RAID scheme, and instead assume you’ve
got two disks in a box and you performed the install onto the first
disk (hda
), thus leaving the second disk (hdb
) unused.
- Partition
hdb
how you like it. My server partitioning scheme is like
this for IDE machines:
hda1 500M /
hda2 2G swap
hda3 Extended
hda5 2G /var
hda6 4G /usr
hda7 remaining /data
/data
contains bind mount points for /home
, /var/lib/{mysql,postgres}
- Create a degraded RAID array.
mdadm -C -l 1 -n 2 /dev/md0 /dev/hdb1 missing
mdadm -C -l 1 -n 2 /dev/md1 /dev/hdb5 missing
mdadm -C -l 1 -n 2 /dev/md2 /dev/hdb6 missing
mdadm -C -l 1 -n 2 /dev/md3 /dev/hdb7 missing
-l
specifies the raid level, in this case RAID-1, and -n specifies the
number of devices that will be in the array, in our case 2. missing
lets mdadm know that there is another device we haven’t specified yet,
and the array will be built on one half.
If you’re doing this on RAID-5, then you’d use two disks and a
missing
.
- Create the new filesystems:
mke2fs -j /dev/md0
...
- Mount the new partitions.
mkdir /newroot
mount /dev/md0 /newroot
cd /newroot
mkdir usr var data
mount ...
mkdir data/var.lib.postgres
mkdir -p var/lib/postgres
mount -o bind data/var.lib.postgres var/lib/postgres
- Copy the data from the existing root to the new root
cd /
for each mountpoint on the old root:
cp -ax $mountpoint/ /newroot/$mountpoint/
Now comes the fun part. Shutdown all services that are writing to the
disk, using ps ax
and netstat -lnp
to find out who’s still alive.
Rsync the data that just got written
for each mountpoint:
rsync -avnx $mountpoint/ /newroot/$mountpoint/
double check that did what you thought, then remove the -n
option
- Pivot the kernel onto the new root:
mkdir /newroot/oldroot
cd /newroot
pivot_root . oldroot
If you’re on the console, you can
exec chroot . /bin/sh <dev/console >dev/console 2>&1
otherwise if you’re playing tough-guy-migration via SSH, don’t do
this, yet
mount -t proc proc /proc
mount -t devpts devpts /dev/pts
mount -t tmpfs tmpfs /tmp
and for 2.6 kernels
mount -t sysfs sysfs /sys
telinit u
/etc/init.d/ssh restart
Now if you’re in tough guy mode, ssh into the new machine.
fuser -vm /oldroot
should show you a few kernel threads and your first
ssh session. If the ssh restart was successful and you’re logged in,
log out of the first ssh session.
see the processes holding up the umount
fuser -mv /oldroot
see the mounts holding up the umount
cat /proc/mounts
umount the virtual filesystems from oldroot:
umount /oldroot/proc
umount /oldroot/dev/pts
and so on
Chances are /proc/mounts
says you’ve got a /dev/root.old
and a
/dev2/root2
, and you’ve got some kernel threads attached to
/oldroot/initrd
:
mount -o remount,ro /dev/root.old /oldroot/initrd
mount -o remount,ro /dev2/root2 /oldroot
umount -l /oldroot/initrd
umount /oldroot
The -l
to umount is a recent feature that does a lazy umount… it
removes the mount point from the mounted filesystems namespaces, so
it’s effectively gone, but it’ll get properly umounted when all
processes using it are finished. It’s a good idea to make it
read-only first just so you don’t break anything.
- Fix
/etc/fstab
and /etc/mtab
/etc/fstab
has the old hda filesystems on it, so fix that up.
/etc/mtab
has the old devices listed because the pivot_root
doesn’t
update it, so fix that up too. Cross check against /proc/mounts
.
Debian unstable uses grub, so somethign like this will install the
first stage bootloader into both MBRs:
grub
grub> device (hd0) /dev/hda
grub> device (hd1) /dev/hdb
grub> root (hd0,0)
grub> setup (hd0)
grub> root (hd1,0)
grub> setup (hd1)
That tells grub that it’s hd0
is Linux’s hda
, to use hda1
as the
location for grubs files (/boot/grub
, as /boot
is on /
in my case)
and to install the MBR on /dev/hda
. The second pass is to do the same
thing on hdb, using /dev/hdb1
and hdb’s MBR.
double check it worked by looking for the string GRUB in the output
of
dd if=/dev/hdX count=1 | strings
I always forget this part and end up booting off of half of the /
array which has the effect of destroying the raid superblock,
requiring /dev/md0
to be reconstructed afterwards. So the moral is
DON’T FORGET THIS PART.
mkinitrd -k -o /boot/initrd.img.tmp
Look in the temporary directory that mkinitrd left its files in,
/tmp/mkinitrd.*/initrd
and make sure that the file ‘script’ contains a
line that builds the /dev/md0
array, like this:
mdadm -A /devfs/md/0 -R -u ...
It’ll probably be building it using only /dev/hdb1
at this moment,
that’s fine.
/etc/init.d/mdadm
and /etc/init.d/mdadm-raid
will automagically build
the remaining arrays at boot time if you get this right, otherwise the
fsck will bomb out because /dev/md1
and friends are corrupted (i.e.
don’t really exist)
So, in /etc/mdadm/mdadm.conf
:
DEVICE /dev/hda* /dev/hdb*
ARRAY /dev/md0 devices=/dev/hda1,/dev/hdb1
ARRAY /dev/md1 devices=/dev/hda5,/dev/hdb5
ARRAY /dev/md2 devices=/dev/hda6,/dev/hdb6
ARRAY /dev/md3 devices=/dev/hda7,/dev/hdb7
Make sure you remember the DEVICE line, otherwise it’ll still fail…
- Reconstruct the RAID array from the now free hda
sfdisk -d /dev/hdb | sfdisk /dev/hda
That’ll copy the partition table from hdb
, your good disk, to hda
, the
missing disk.
Hot add the partitions to the array:
mdadm -a /dev/md0 /dev/hda1
mdadm -a /dev/md1 /dev/hda5
mdadm -a /dev/md2 /dev/hda6
mdadm -a /dev/md3 /dev/hda7
ramp up the reconstruction speed:
echo 1000000000 > /proc/sys/dev/raid/speed_limit_max
watch the progress:
watch "cat /proc/mdstat"
- Do the boot loader again, because it’s fun, and likely stuff has moved
around on
hda1
.
Don’t do anything on hda
until the raid reconstruction is finished.
At this point you can continue using the machine, in fact as early as
the “umount oldroot” step you can restart all your services and the
machine will be back online: that’s a downtime of only as long as it
takes to do the final rsync before pivoting.
I’d recommend rebooting soon after though so yuo can make sure you got
the bootloader and initrd part right. During booting you can get away
with changing the kernel root=
option to use /dev/hda1
if the raid array
isn’t getting constructed in your initrd.img.tmp
, and don’t delete or
overwrite any of your initrd
s whilst you’re debugging, only once it
boots without assistance should you overwrite the initrd.img
that’s
listed in the grub menu.lst.