Restoring files owner and permissions
Note: TL;DR version at the end.
What could go wrong doing:
chown -R foo.foo $DATA_DIR/
as root? Yes, exactly: $DATA_DIR
might not be defined and you end up setting
the owner/group to foo.foo
for all the files in the machine, including executables and
devices. Of course, I learnt that the hard way. Even more, I lost the only ssh
session I had on the 3 machines where I did this (mental note: be very
very very careful when using cssh
or similar tools), but luckly I still had a fourth twin sister which survived.
The first thing you notice is that you can't ssh anymore into the machine. This
is because ssh
denies to answer any connections with this message: fatal:
/var/run/sshd must be owned by root and not group or world writable.
.
The fisrt step is to restore some initially sane ownership. The safest is
root.root
, but unluckly I didn't realize this until after I lost said ssh
sessions. So actually, the first step is to regain control of the machine. The
only way I can think of is to login via the console. I thought that this would
imply also rebooting and going in single mode or even using the init=/bin/bash
hack which has saved more than one life, or at least sanity, but login
keeps
working even when the perms are wrong. I seem to remember that was not the case,
probably because I thought it was setuid
, but sudo find / -perm -4000
confirms that this is not the case.
So, after loging in, I need 2 commands: chown -R root.root /
and
/etc/init.d/ssh start
(because now ssh
does not start at boot time,
seemingly leaving no error message behind) and now you can login via ssh
again.
Now it's time to restore the real ownership. Handy comes a small tool called
acl
. This small package has two tools, getfacl
and its evil twin setfacl
.
With these we're really close to the final solution. getfacl --recursive / > all.acl
pulls the ownerships, then you copy all.acl
to the sick server(s) and apply it
like a cure-all medicine like this: setfacl --recursive --restore /root/all.acl
while being in the root (/
) directory. Actually, think of it as a blood or
bone marrow transplant.
As final notes, don't use getfacl
's --tabular
option, as setfacl
doesn't
recognize the format. Also, you can check that the ownerships were correctly
with find / ! -user root | xargs ls -ld
and or you can dump the new perms and
compare with those you got from the donnor machine.
Update
I give the problem a little bit more of thought and came to the conclusion that
with that method you have no idea if you restored all the ownerships properly or
not. Taking advantage that I actually changed both owner and group, I can replace
the first command with chown -R root /
, and then find the not-undone files with
find / -group foo
. I hadn't tested this, but I think it's ok. Another option is
to initially only restore with chown
the minimal files to make ssh
work again
and then find
for all the foo.foo
s.
So, the promised TL;DR version:
- On the sick machine(s), login via the console and run:
chown -R root /
/etc/init.d/ssh start
- Select a donnor machine and run:
getfacl --recursive / > all.acl
- Transplant
all.acl
to the sick machine(s) (most probably withscp
). - On the sick machine(s), now via
ssh
if its more confortable for you, run: cd /
setfacl --recursive --restore /root/all.acl
-
find / -group foo
and change by hand any remaining not0undone files.