Note: this is a translation of an old post. I decided to translate it because now I'm looking for aposition (tell your friends!) and I would like this post to show how I work.
Last Saturday I received an email from one of the guys from work with the subject «urgennnnnnnnt: heeeeeeeeeelp»[sic]. He says he was idling on Friday night when his machine stopped emiting sound via the soundcard and then it behaved erratically. When he tried rebooting it, it didn't boot anymore. «It says something about disk not bootable...».
Monday morning I go to work and go to see the machine. Precisely, it said something about «disk not bootable». I boot with a USB key with GRML and I find that the disk has no partitions.
The guy is doing ain something astronomical (literally) and all his work is in that machine. No backups, as usual, so I prepare myself to rescue the partitions.
In that same USB key I have a system with
parted. I boot with it and I try using
parted's rescue tool. Nothing. I ask the guy how the disk was partitioned, etc.
He tells me that he only installed Kubuntu clicking 'Next'. Kubuntu by default
creates a swap partition and an ext3 partition for / and that's it, which made
what was coming relatively easy.
I reboot in GRML and I use
hexdump -C /dev/sda | more to see the disk's
content. This is not the first time that I juggle with partitions and MBRs,
but last time I did it, I used a tool that is now discontinued (the tool was
called , included in The Norton Utilities), which had special edit modes
for MBRs, FATs, and a lot of useful things... in MS universe.
First I confirm that, yes, the first sector is a MBR (at least it has the
0x55aa signature at the end), and that the whole partition
table is empty,
but in the second sector of the disk there seems to be a copy. I take pen and
paper, write down what I found, but it turns out not only I have half the data,
the partition I thought I found was too small.
So I decide to look for the partition by hand. To do that I needed to find out first how does the ext3 kernel code know wether a partition is ext3 or not. I knew it would be some kind of magic signature, but I had no idea which. So I installed the sources for 2.6.29 in my laptop and started to look at ext3's code. After going around a lot, including reading the code that is excuted when you mount a filesystem of type ext3, where we can see that it uses a magic signature and the structure of the ext3 superblock, where we can see the magic's offset is 0x38.
So the problem of finding an ext3 partition is reduced to the problem of finding
0x53ef (damn little endian) at a sector's offset 0x38 in the disk. Luckily
more has a find tool, so I sit down to search every occurrence of
hoping that the address at the left ends in
30 and that they would be the
9th and 10th bytes in the line (damn 0 based offsets).
A few 'next' after, I get my first candidate. It looks good, because I was also
comparing my findings with a similar dump from my USB key (which I have
ext2, and luckily
ext3 share those structures), and
also I spot something that looks like a
This candidate's address is
0x80731038. I substract
0x38 and I get the
0x80731000, a nice round number for a superblock. Converted to decimal
2.155.024.384, some 2GiB from the disk's begginning. Looks really good!
The swap partition could be before the root one, and could have that size.
fdisk /dev/sda to see the (still empty) partition table. It says there's
16.065 sectors per cylinder, times
512 bytes per sector, equals
bytes per cylinder. Almost all distros (actually I think all of them) partition
disks at cylinder boundaries, so if the sector I found is right at the
beginning of a cylinder...
¡Damn! I almost had it... Hmm, how much is the factional part?
1024! ¿Is it that...?
strace debugfs -R show_super_stats /dev/sdb1 (the partition
in my USB key) and I see that it actually seeks
1024 bytes within the
partition for reading the superblock!
This is it. With 262 in my head, I fire
fdisk /dev/sda and I create two
partitions: swap in cylinders 1-261 and root from cylinder 262 till the end. I
save, cross my fingers and I run
debugfs -R show_super_stats
/dev/sda1. It fails! What's wrong? I reboot and I try again, just in case the
kernel did not re-read correctly the partition table. It fails again. WTF?
Ah, duh, it's
sda2, where do I have my head... Ok,
show_super_stats /dev/sda2... it works, the sonofabitch works! I can't believe
it. I risk it:
fsck -n /dev/sda2. «Filesystem is clean». Damn, I try harder:
fsck -n -f /dev/sda2...
Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/sda2 etc etc...
It's fine! But the MBR doesn't have GURB installed, so I do the usual GRUB reinstall process, I reboot...
It boots like nothing has happened, and it finishes in a beautiful login. Satisifed, I pat myself in the back, pack my things and I start my weekend.
 ... wasting some 8MiB between the MBR and the first partition.
 The sharp ones reading this will notice that this can not give an integer by no means.
 Reiser magics are funny. Looks like he started the fad that nowuse.
Today I decided to upgrade my home server (the one that serves this blog) from lenny to squeeze. Here is a 'log' of the experience.
My first mistake was on the name: it is not
Second, the server once was also a minimal desktop, so I deinstalled a lot of
desktop soft to make the upgrade smaller and easier. I simply used my favorite
package manipulation toll,
dselect, and selected for purging all the
optional and extra packages in sections libs, python and perl. When the
consequences where shown to me, I just marked as install the software I wanted.
After that, ~450 packages where removed.
mdione@cobra:~$ sudo apt-get upgrade [...] 233 upgraded, 0 newly installed, 0 to remove and 142 not upgraded. Need to get 67.2MB of archives. After this operation, 11.5MB of additional disk space will be used. [...]
apt-listbugs installed, so I got this question just before accepting
serious bugs of libslang2 (2.1.3-3 -> 2.2.2-4) <marked as done in some version> #615909 - Copyright file does not clearly state licence terms (Fixed: slang2/2.2.3-1) serious bugs of deborphan (1.7.27 -> 18.104.22.168) <marked as done in some version> #618895 - orphaner enteres infinite loop on sparc (Fixed: deborphan/22.214.171.124) critical bugs of initscripts (2.86.ds1-61 -> 2.88dsf-13.1) <unfixed> #612594 - On boot thw wait have no job to wait for, and fail into reboot. serious bugs of libpcre3 (7.8-2 -> 8.02-1.1) <unfixed> #616660 - /usr/bin/pcretest must not be shipped in libpcre3 Summary: libpcre3(1 bug), libslang2(1 bug), deborphan(1 bug), initscripts(1 bug) Are you sure you want to install/upgrade the above packages? [Y/n/?/...]
The critical bug for
initscripts looked ugly, so I checked it in more depth.
It seemed to affect
usplash, which I don't use (no use in a headless server,
right?), so I bit the bullet and continued. The bug's discussion said that
the solution was to purge usplash anyways... The rest of the bugs were,
pragmatically talking, not interesting to me.
apt-listchanges showed me the unread entries in the
files for the upgraded packages, with no news that applied to my server.
Interesting was the split of
pam_cracklib into itself and
so now we can test the reuse of passwords without checking for dictionary
attacks, in the strange case one would want to do so, or the other way around.
That means that if you want both, you got to enable both.
Besides some conffile resolution (it would be nice to be able to resolve diff
xxdiff), the upgrade went smooth.
Upgrading the kernel was painless too. The kernel NEWS included a
note on PATA devices rename due to the new SCSI/PATA drivers, but I was aware of
that because I read about the upgrade issues first :) I just had to tell
linux-base to please update the disk devices in my config files, which is the
udev was a little bit more bumpy:
critical bugs of udev (0.125-7+lenny3 -> 164-3) <unfixed> #593083 - udev - system hangs at login screen serious bugs of util-linux (126.96.36.199-1 -> 2.17.2-9) <unfixed> #613592 - /sbin/fdisk: Can't create at sector 63 #613589 - /sbin/cfdisk: Bad Table error after fresh Squeeze install
The first one seemed to be something quite handlable, and the other two were not interesting to me. The bullet just needed a little bit more of squeezing, that's all (Hahahahaaa! I told a joke! I can do this crap too!).
For the last step I used
dselect again, as I love the way it presents
dependency resolution. I took the opportunity to purge all the obsolete packages,
and I got no dependency problem with that, which means the upgrade should be
complete and smooth. This last step meant:
128 upgraded, 68 newly installed, 32 to remove and 0 not upgraded. Need to get 96.5MB of archives. After this operation, 33.1MB disk space will be freed.
Yeah, smooth indeed, except for these:
serious bugs of wget (1.11.4-2+lenny2 -> 1.12-2.1) <marked as done in some version> #614373 - wget: mixes dpatch and 3.0 (quilt) (Fixed: wget/1.12-3) serious bugs of lvm2 (2.02.39-8 -> 2.02.66-5) <marked as done in some version> #603710 - root and swap devices on lvm do not correctly show up in udev (missing symlinks) (Fixed: lvm2/2.02.84-1) Merged with: 593625 serious bugs of libgssapi-krb5-2 ( -> 1.8.3+dfsg-4) <marked as done in some version> #611906 - GSSAPI in krb5 1.8 fails to delegate credentials to grave bugs of libdpkg-ruby1.8 (Fixed: krb5/1.8.3+dfsg-5)(0.3.2 -> 0.3.6+nmu1) <unfixed> #585448 - Leaves files open as it scans, resulting in too many open files Merged with: 600260 grave bugs of dash (0.5.4-12 -> 0.5.5.1-7.4) <unfixed> #540512 - dash upgrade breaks mksh-as-/bin/sh #538822 - dash fails to upgrade if /bin/sh is locally diverted grave bugs of grub-pc ( -> 1.98+20100804-14) <unfixed> #593648 - grub-pc install fails on RAID1 (unknown filesystem) #590884 - grub-pc: upgrading with vmlinuz-2.6.32-5-amd64 kernel fails on device detection #612220 - after update to squeeze grub2 don't load the system #620663 - grub-pc hangs after upgrading lenny to squeezy grave bugs of openssh-client (1:5.1p1-5 -> 1:5.5p1-6) <unfixed> #607267 - /usr/bin/scp: fails to notice close() errors grave bugs of elinks (0.12~pre2.dfsg0-1 -> 0.12~pre5-2) <unfixed> #617713 - Caches documents in violation of HTTP spec and general sanity serious bugs of apt (0.7.20.2+lenny2 -> 0.8.10.3) <unfixed> #558784 - apt: re-adds removed keys serious bugs of python2.5 (2.5.2-15+lenny1 -> 2.5.5-11) <unfixed> #598372 - python2.5: uses the network during build serious bugs of lvm2 (2.02.39-8 -> 2.02.66-5) <unfixed> #603036 - lvm2: fails to install due to incorrect dependencies in init.d LSB header serious bugs of munin (1.2.6-10~lenny2 -> 1.4.5-3) <unfixed> #619399 - munin shouldn't recreate apache conf on every update serious bugs of grub (0.97-47lenny2 -> 0.97-64) <unfixed> #594283 - grub: non-standard gcc/g++ used for build (gcc-4.3) serious bugs of insserv ( -> 1.14.0-2) <unfixed> #598020 - barfs when there are "invalid" init scripts
I focused on
grub-pc bugs, because I don't have a monitor and having the
server unbootable is not my idea of a funny way to spend my afternoon. 612220
and 620663 seemed the most graves ones, so I checked them. The latter seems more
complicated, so I just added another bullet in my mouth and continued. The rest
of the bugs seemed harmless enough for me. The NEWS had nothing either. The
moment of truth was coming closer.
Debconf part, I chose to put
grub-pc in the boot sector of the
root partition and chainload it from
grub-legacy which is still installed in
the MBR. I also kept the old kernel, just in case.
With a mouthfull of bullets in different states of chewedness, I rebooted the server. As spected, nothing happened; that is, it booted fine, no problems, all services still there. Dissapointed, now I' looking intently to a more complicated server for an upgrade.
 yes, I know
dselect is in maintenance mode now, and that it knows nothing
about automatic packages, but I mostly know what I need and what not. In any
case, what's missing can be installed later. Nothing critical is run in this
 If you hadn't, you really have to see Ahmed, the suicide terrorist