Restoring files owner and permissions

Note: TL;DR version at the end.

What could go wrong doing:

chown -R foo.foo $DATA_DIR/

as root? Yes, exactly: $DATA_DIR might not be defined and you end up setting the owner/group to foo.foo for all the files in the machine, including executables and devices. Of course, I learnt that the hard way. Even more, I lost the only ssh session I had on the 3 machines where I did this (mental note: be very very very careful when using cssh or similar tools), but luckly I still had a fourth twin sister which survived.

The first thing you notice is that you can't ssh anymore into the machine. This is because ssh denies to answer any connections with this message: fatal: /var/run/sshd must be owned by root and not group or world writable..

The fisrt step is to restore some initially sane ownership. The safest is root.root, but unluckly I didn't realize this until after I lost said ssh sessions. So actually, the first step is to regain control of the machine. The only way I can think of is to login via the console. I thought that this would imply also rebooting and going in single mode or even using the init=/bin/bash hack which has saved more than one life, or at least sanity, but login keeps working even when the perms are wrong. I seem to remember that was not the case, probably because I thought it was setuid, but sudo find / -perm -4000 confirms that this is not the case.

So, after loging in, I need 2 commands: chown -R root.root / and /etc/init.d/ssh start (because now ssh does not start at boot time, seemingly leaving no error message behind) and now you can login via ssh again.

Now it's time to restore the real ownership. Handy comes a small tool called acl. This small package has two tools, getfacl and its evil twin setfacl. With these we're really close to the final solution. getfacl --recursive / > all.acl pulls the ownerships, then you copy all.acl to the sick server(s) and apply it like a cure-all medicine like this: setfacl --recursive --restore /root/all.acl while being in the root (/) directory. Actually, think of it as a blood or bone marrow transplant.

As final notes, don't use getfacl's --tabular option, as setfacl doesn't recognize the format. Also, you can check that the ownerships were correctly with find / ! -user root | xargs ls -ld and or you can dump the new perms and compare with those you got from the donnor machine.

Update

I give the problem a little bit more of thought and came to the conclusion that with that method you have no idea if you restored all the ownerships properly or not. Taking advantage that I actually changed both owner and group, I can replace the first command with chown -R root /, and then find the not-undone files with find / -group foo. I hadn't tested this, but I think it's ok. Another option is to initially only restore with chown the minimal files to make ssh work again and then find for all the foo.foos.

So, the promised TL;DR version:

  • On the sick machine(s), login via the console and run:
  • chown -R root /
  • /etc/init.d/ssh start
  • Select a donnor machine and run:
  • getfacl --recursive / > all.acl
  • Transplant all.acl to the sick machine(s) (most probably with scp).
  • On the sick machine(s), now via ssh if its more confortable for you, run:
  • cd /
  • setfacl --recursive --restore /root/all.acl
  • find / -group foo and change by hand any remaining not0undone files.