Recovering Ubuntu 8.04 LTS from Failed Harddisk

You may have noticed that the web server hosting my homepage and download site has been down since the weekend. I noticed problems when the sites responded to request with MySQL errors.

Running fsck returned the message “Bad magic number in super-block” which means that e2fsck cannot completely repair the disk. It turned out that the partition table had been destroyed, but mke2fs -n still found some superblocks.

Started up the PC with a Knoppix Live CD to repair the broken disk.

In couple of forums I found the utility TestDisk which re-creates lost partitions, and it both recovered the boot and the swap partition. However, e2fsck still failed with the messages:

Attempt to read block from filesystem resulted in short read
Attempt to read block from filesystem resulted in short read
reading journal superblock
Attempt to read block from filesystem resulted in short read
while checking ext3 journal

It was now clear that the harddisk could not be repaired, so I got a new one and copied the original harddisk with a program called Ddrescue.

Ddrescue is a great tool (documentation), as it copies one device onto another, displaying the number of read errors and the size of the erroneous blocks. The amazing thing (if you don’t know how it works) is that the error size *reduces* after the first full scan of the source disk. The initial 2.5GB of unreadable disk finally reduced to about 15MB.

After ddrescue was finished, I ran another fsck on the new disk, this time successfully. Time to reboot.

Reboot brought a black screen, with “1234F” the only thing displayed. It turned out that that was the remainder of the TestDisk MBR which could not find a bootable partition. Need to get GRUB back.

The Knoppix disk would not help me now (disks are named /dev/hda instead of /dev/sda), but fortunately I had a Ubuntu 8.10 Desktop disk already which also offers a Live functionality.

Booted Ubuntu CD, and restored GRUB as sketched in this forum thread:

sudo grub
find /boot/grub/stage1
root (hd0,0)
setup (hd0)

Now at least GRUB was booting, but it also served me the next error message:

Kernel panic: VFS: Unable to mount root fs on unknown-block(0,0)

If tried to understand GRUB and the disk UUIDs mentioned in menu.lst, when I guessed that the problem was caused by a broken initrd.img.

I backed up the original initrd.img-2.6.24-16-server and copied the initrd.img-2.6.24-16-server.bak to the original name. And it worked!

As far as I can tell, the machine is back online and fully functioning again. But it was quite a trip 😉

