Note: this is a translation of an old post. I decided to translate it because now I'm looking for a SysAdmin position (tell your friends!) and I would like this post to show how I work.

Last Saturday I received an email from one of the guys from work with the subject «urgennnnnnnnt: heeeeeeeeeelp»[sic]. He says he was idling on Friday night when his machine stopped emiting sound via the soundcard and then it behaved erratically. When he tried rebooting it, it didn't boot anymore. «It says something about disk not bootable...».

Monday morning I go to work and go to see the machine. Precisely, it said something about «disk not bootable». I boot with a USB key with GRML and I find that the disk has no partitions.

Panic.

The guy is doing a PostDoc in something astronomical (literally) and all his work is in that machine. No backups, as usual, so I prepare myself to rescue the partitions.

In that same USB key I have a system with parted. I boot with it and I try using parted's rescue tool. Nothing. I ask the guy how the disk was partitioned, etc. He tells me that he only installed Kubuntu clicking 'Next'. Kubuntu by default creates a swap partition and an ext3 partition for / and that's it, which made what was coming relatively easy.

I reboot in GRML and I use hexdump -C /dev/sda | more to see the disk's content. This is not the first time that I juggle with partitions and MBRs, but last time I did it, I used a tool that is now discontinued (the tool was called DiskEdit, included in The Norton Utilities), which had special edit modes for MBRs, FATs, and a lot of useful things... in MS universe.

First I confirm that, yes, the first sector is a MBR (at least it has the 0x55aa signature at the end), and that the whole partition table is empty, but in the second sector of the disk there seems to be a copy. I take pen and paper, write down what I found, but it turns out not only I have half the data, the partition I thought I found was too small.

So I decide to look for the partition by hand. To do that I needed to find out first how does the ext3 kernel code know wether a partition is ext3 or not. I knew it would be some kind of magic signature, but I had no idea which. So I installed the sources for 2.6.29 in my laptop and started to look at ext3's code. After going around a lot, including reading the code that is excuted when you mount a filesystem of type ext3, where we can see that it uses a magic signature[3] and the structure of the ext3 superblock, where we can see the magic's offset is 0x38.

So the problem of finding an ext3 partition is reduced to the problem of finding 0x53ef (damn little endian) at a sector's offset 0x38 in the disk. Luckily more has a find tool, so I sit down to search every occurrence of 53 ef, hoping that the address at the left ends in 30 and that they would be the 9th and 10th bytes in the line (damn 0 based offsets).

A few 'next' after, I get my first candidate. It looks good, because I was also comparing my findings with a similar dump from my USB key (which I have formatted as ext2, and luckily ext2 and ext3 share those structures), and also I spot something that looks like a uuid.

This candidate's address is 0x80731038. I substract 0x38 and I get the address 0x80731000, a nice round number for a superblock. Converted to decimal that's 2.155.024.384, some 2GiB from the disk's begginning. Looks really good! The swap partition could be before the root one, and could have that size.

I use fdisk /dev/sda to see the (still empty) partition table. It says there's 16.065 sectors per cylinder, times 512 bytes per sector, equals 8.225.280 bytes per cylinder. Almost all distros (actually I think all of them) partition disks at cylinder boundaries[1], so if the sector I found is right at the beginning of a cylinder...

I divide 2.155.024.384/8.225.280=...

(suspense pause)[2]

262.000124494...

¡Damn! I almost had it... Hmm, how much is the factional part? (262.000124494-262)*8.225.280=... ¡1024! ¿Is it that...?

I run strace debugfs -R show_super_stats /dev/sdb1 (the partition in my USB key) and I see that it actually seeks 1024 bytes within the partition for reading the superblock!

This is it. With 262 in my head, I fire fdisk /dev/sda and I create two partitions: swap in cylinders 1-261 and root from cylinder 262 till the end. I save, cross my fingers and I run debugfs -R show_super_stats /dev/sda1. It fails! What's wrong? I reboot and I try again, just in case the kernel did not re-read correctly the partition table. It fails again. WTF?

Ah, duh, it's sda2, where do I have my head... Ok, debugfs -R show_super_stats /dev/sda2... it works, the sonofabitch works! I can't believe it. I risk it: fsck -n /dev/sda2. «Filesystem is clean». Damn, I try harder: fsck -n -f /dev/sda2...

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda2 etc etc...

It's fine! But the MBR doesn't have GURB installed, so I do the usual GRUB reinstall process, I reboot...

It boots like nothing has happened, and it finishes in a beautiful login. Satisifed, I pat myself in the back, pack my things and I start my weekend.

sysadmin rescue


[1] ... wasting some 8MiB between the MBR and the first partition.

[2] The sharp ones reading this will notice that this can not give an integer by no means.

[3] Reiser magics are funny. Looks like he started the fad that now AdOlEsCeNtS use.