Note: this is a translation of an old post. I decided to translate it because now I'm looking for a SysAdmin position (tell your friends!) and I would like this post to show how I work.

Last Saturday I received an email from one of the guys from work with the subject «urgennnnnnnnt: heeeeeeeeeelp»[sic]. He says he was idling on Friday night when his machine stopped emiting sound via the soundcard and then it behaved erratically. When he tried rebooting it, it didn't boot anymore. «It says something about disk not bootable...».

Monday morning I go to work and go to see the machine. Precisely, it said something about «disk not bootable». I boot with a USB key with GRML and I find that the disk has no partitions.

Panic.

The guy is doing a PostDoc in something astronomical (literally) and all his work is in that machine. No backups, as usual, so I prepare myself to rescue the partitions.

In that same USB key I have a system with parted. I boot with it and I try using parted's rescue tool. Nothing. I ask the guy how the disk was partitioned, etc. He tells me that he only installed Kubuntu clicking 'Next'. Kubuntu by default creates a swap partition and an ext3 partition for / and that's it, which made what was coming relatively easy.

I reboot in GRML and I use hexdump -C /dev/sda | more to see the disk's content. This is not the first time that I juggle with partitions and MBRs, but last time I did it, I used a tool that is now discontinued (the tool was called DiskEdit, included in The Norton Utilities), which had special edit modes for MBRs, FATs, and a lot of useful things... in MS universe.

First I confirm that, yes, the first sector is a MBR (at least it has the 0x55aa signature at the end), and that the whole partition table is empty, but in the second sector of the disk there seems to be a copy. I take pen and paper, write down what I found, but it turns out not only I have half the data, the partition I thought I found was too small.

So I decide to look for the partition by hand. To do that I needed to find out first how does the ext3 kernel code know wether a partition is ext3 or not. I knew it would be some kind of magic signature, but I had no idea which. So I installed the sources for 2.6.29 in my laptop and started to look at ext3's code. After going around a lot, including reading the code that is excuted when you mount a filesystem of type ext3, where we can see that it uses a magic signature[3] and the structure of the ext3 superblock, where we can see the magic's offset is 0x38.

So the problem of finding an ext3 partition is reduced to the problem of finding 0x53ef (damn little endian) at a sector's offset 0x38 in the disk. Luckily more has a find tool, so I sit down to search every occurrence of 53 ef, hoping that the address at the left ends in 30 and that they would be the 9th and 10th bytes in the line (damn 0 based offsets).

A few 'next' after, I get my first candidate. It looks good, because I was also comparing my findings with a similar dump from my USB key (which I have formatted as ext2, and luckily ext2 and ext3 share those structures), and also I spot something that looks like a uuid.

This candidate's address is 0x80731038. I substract 0x38 and I get the address 0x80731000, a nice round number for a superblock. Converted to decimal that's 2.155.024.384, some 2GiB from the disk's begginning. Looks really good! The swap partition could be before the root one, and could have that size.

I use fdisk /dev/sda to see the (still empty) partition table. It says there's 16.065 sectors per cylinder, times 512 bytes per sector, equals 8.225.280 bytes per cylinder. Almost all distros (actually I think all of them) partition disks at cylinder boundaries[1], so if the sector I found is right at the beginning of a cylinder...

I divide 2.155.024.384/8.225.280=...

(suspense pause)[2]

262.000124494...

¡Damn! I almost had it... Hmm, how much is the factional part? (262.000124494-262)*8.225.280=... ¡1024! ¿Is it that...?

I run strace debugfs -R show_super_stats /dev/sdb1 (the partition in my USB key) and I see that it actually seeks 1024 bytes within the partition for reading the superblock!

This is it. With 262 in my head, I fire fdisk /dev/sda and I create two partitions: swap in cylinders 1-261 and root from cylinder 262 till the end. I save, cross my fingers and I run debugfs -R show_super_stats /dev/sda1. It fails! What's wrong? I reboot and I try again, just in case the kernel did not re-read correctly the partition table. It fails again. WTF?

Ah, duh, it's sda2, where do I have my head... Ok, debugfs -R show_super_stats /dev/sda2... it works, the sonofabitch works! I can't believe it. I risk it: fsck -n /dev/sda2. «Filesystem is clean». Damn, I try harder: fsck -n -f /dev/sda2...

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda2 etc etc...

It's fine! But the MBR doesn't have GURB installed, so I do the usual GRUB reinstall process, I reboot...

It boots like nothing has happened, and it finishes in a beautiful login. Satisifed, I pat myself in the back, pack my things and I start my weekend.

sysadmin rescue


[1] ... wasting some 8MiB between the MBR and the first partition.

[2] The sharp ones reading this will notice that this can not give an integer by no means.

[3] Reiser magics are funny. Looks like he started the fad that now AdOlEsCeNtS use.

Posted Thu 07 Apr 2011 11:46:15 PM CEST Tags:
01
Posted Wed 06 Apr 2011 08:28:18 PM CEST
02
Posted Wed 06 Apr 2011 08:28:18 PM CEST
03

Last year and a half I was working in research. This position was about, among other things, to port a programming language (two, actually, Hop and Bigloo) to the Antroid platform. I already had written something about it, but this time I want to show my high level impressions of the platform. What follows is part of a report I wrote at the end of that job, which includes the things I wanted to say.

The Android port can be viewed as four separate sub-tasks. Hop is software developed in the Scheme language; more particularly, it must be compiled with the Bigloo Scheme compiler, which in turn uses gcc for the final compilation. That means that we also needed to port Bigloo first to the platform, not because we were planning to use it in the platform, but because we need the Bigloo runtime libraries ported to Android, as Hop and any other program compiled with Bigloo uses them. The other three subtasks, which are discussed later, are porting Hop itself; developing libraries to access devices and other features present in the platform; and, we'll see later the reasons, make the port work with threads.

When we started to investigate how to port native code to the platform we found that there wasn't much support. At fisrt the only documentation we could find was blog posts of people trying to do it by hand. They were using the compiler provided in Android's source code to compile static binaries that could be run on the platform. Because Bigloo uses dinamic libraries to implement platform dependent code and modules, we aimed to find a way to compile things dinamically. After 3 or 4 weeks we found a wrapper written in Ruby that managed all the details of calling gcc with the proper arguments. With this we should be able to port anything that uses gcc as the compiler, just like Bigloo does. At the same time, the first version of Android's NDK (Native Development Kit) appeared, but it wasn't easy to integrate in our build system.

(Note: Actually I think most of the problems we faced doing this port stem from this. The NDK forces you to write a new set of Makefiles, but our hand-made build system and build hierarchy made such an effort quite big. Also, that mean supporting a parallel build system, while it should not be so crazy to spect a cleaner way to integrate the toolchain into an existing build system, not only in hand-made like in this case, but also the most common ones, like autotools, cmake, etc.)

Even having the proper compiler, we found several obstacles related to the platform itself. First of all, Bigloo relies heavily on the C and pthread libraries to implement lowlevel functionality. Bigloo can use both glibc, GNU's implementation, or µlibc, an implementation aimed for embedded aplications. Bigloo also relies on Boehm's Garbage Collector (GC) for its memory management. The C library implementation in Android is not the glibc or the µlibc, but an implementation developed by Google for the platform, called Bionic. This version of the C library is tailored to the platform's need, with little to no regards to native application development.

The first problem we found is that GC compiled fine with Bionic, but the apllications that used GC did not link: there was a missing symbol that normally is defined in the libc, but that Bionic did not. We tried cooperating with the GC developers, we tried inspecting a Mono port to Android, given that this project also uses GC, trying to find a solution that could be useful for everyone, but at the end we just patched our sources to fake such symbol with a value that remotely made sense.

We also found that Bionic's implementation of pthreads is not only incomplete, but also has some glitches. For instance, in our build system, we test the existence of a function like everybody else: we compile a small test program wich uses it. With this method we found at least one function that is declared but never defined. That means that Bionic declares that the function exists, but then it never implements it. Another example is the declaration and definition of a function, but the lack of definition of constants normally used when calling this function.

Also, because most of the tests also must be executed to inform about the peculiarities of each implementation, we had to change our build system to be able to execute the produced binaries in the Android emulator.

Google also decided to implement their own set of tools, again, trimmed down to the needs of the platform, instead of using old and proven versions, like Busybox. This means that some tools behave differently, with no documentation about it, so we mostly had to work around this differences everytime a new one apperared.

All in all, we spent two and a half months just getting Bigloo to run in Android, dismissing the problem that Boehm's GC, using its own build system, detected that the compiler declared to not support threads, and refused to compile with threads enabled. This meant that Bigloo itself could not be compiled with pthreads support.

With this caveat in mind, we tackled the second subtask, porting Hop itself. This still raised problems with the peculiarities of the platform. We quickly found that the dinamic linker wasn't honoring the LD_LIBRARY_PATH environment variable, which we were trying to use to tell the system where to find the dynamic libraries.

The Android platform installs new software using a package manager. The package manager creates a directory in the SD card that it's only writable by the applilcation being installed. Within this directory the installer puts the libraries declared in the package. Bigloo, besides the dinamic libraries, requieres some aditional files that initialize the global objects. This files are not extracted by the installer, so we had to make a frontend in Java that opens the package and extract them by hand. But the installer creates the directory for the libraries in such a way that the application later cannot write in it.

Also, we found that the dinamic linker works for libraries linked at runtime, but does not for dlopen()'ing them, so we also had to rewrite a great part of our build system for both Bigloo and Hop to produce static libraries and binaries. This also needed disabling the dynamic loading of libraries, and with them, their initialization, so we had to initialize them by hand.

To add more unsuspected work, the Android package builder, provided with the SDK, ignores hidden files, which Bigloo uses to map Scheme module names to dynamic libraries. We had to work around this feature in the unpacking algorithm.

Then we moved to improve the friendliness of the frontend. So far, we could install Hop in the platform, either in a phone or in the emulator, but we could only run it in the emulator, because we were using a shell that runs as root on the emulator, but that runs as a user in a real device. This user, for the reasons given above, cannot even get into Hop's install dir. Even when Android has a component interface that allows applications to use components from other apps, none of the terminal apps we found at that time declared the terminal itself as a reusable component. We decided to use the code from the most popular one, which was based on a demo available on Android's source code, but not installed in actual devices. We had to copy the source code and trimm it down to our needs.

Having a more or less usable Hop package for Android, we decided to try and fix the issue we mentioned before: GC didn't compile with threads enabled. This means that we can't use the pthreads library, which is very useful for Hop. Hop uses threads to attend several requests at the same time. Bigloo implements two threads APIs, one based on pthreads and another which implements fair threads. Hop is able to use 5 different request schedulers, but works better with the one based on pthreads.

For these reasons we decided to focus in getting GC to use threads with the Android platform. GC's build system tests the existence of a threading platform checking the thread model declared by gcc. The gcc version provided with Android's SDK declares to have a 'single thread model', but we couldn't find what does this mean in terms of the code produced by gcc or how this could affect to GC's execution.

(Note: we didn't manage to make GC compile with threads.)

With a threadless Hop running, we had to add code to the server so we could talk between the server and the frontend while at the same time it is attending the requests from a web client. After several attempts to attack this problem, we decided that the best solution was to make this interface another service served by Hop. This meant less modifications to Hop itself, but a bigger one to the frontend we already had.

During these changes we found out a problem with JNI. The terminal component we imported into our code uses a small C library for executing the application inside (normaly a shell, in the original code, but Hop in our case) which is accessed from Java using JNI. The original Term application exported this class as com.android.term.Exec, but our copy exported it as fr.inria.hop.Exec. Even with this namespace difference JNI got confused and tried to use the Exec class from the original Term app. This is just another example how the platform is hard to work with. We found that the community support is more centered around Java and that very few people know about JNI, the NDK or any other native related technologies. We couldn't find an answer to this problem, so we worked around this by renaming the class.

So that's it. I can provide all the technical details for most the assertions I postulated above, but that would make this post unreadbal for its length. If you have any question about them, just conact me.

android

Posted Sat 26 Mar 2011 12:12:39 AM CET Tags:
Posted Wed 06 Apr 2011 08:28:18 PM CEST
05
Posted Wed 06 Apr 2011 08:28:18 PM CEST
06
Posted Wed 06 Apr 2011 08:28:18 PM CEST
07
Posted Wed 06 Apr 2011 08:28:18 PM CEST
08
Posted Wed 06 Apr 2011 08:28:18 PM CEST
09

After several months thinking about it, and just two requests, I finally decided to publish satyr's code. I decided to use github because I already switched to satyr from hg to git, mainly for testing and understanding it. I think I can live with hg, althought branch management in git seems to be given more thought and a good implementation.

So, without further ado: satyr in github

Remember, it's still a test software, by no means polished or ready for human consumption, and with very low development force. Still, I think it has some nice features, like interchangeable skins and a well defined backend, D-Bus support, quick tag edition, reasonable collection managment, and, thanks to Phonon, almost-gapless playback and things like «stop after playing the current this file» (but not «after any given file» yet).

In Debian Sid it mostly works only with the GStreamer backend; I haven't tried the xine one and I know VLC does not emit a signal needed for queueing the next song, so you have to press «next» after each song. AFAIK this is fixed upstream.

satyr pykde python

Posted Sat 03 Sep 2011 02:59:43 PM CEST Tags:
Posted Wed 06 Apr 2011 08:28:18 PM CEST
10
Posted Wed 06 Apr 2011 08:28:18 PM CEST