Email address obfuscation in effect -- please
click here to turn it off.
[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
On Mon, 2 Apr 2007, Stephen Montgomery-Smith wrote:
Here is my guess. /home/sde1 are updated a lot. So the directories (which
consists of a data file containing the names of the files in it, along with
their associated inodes) is kept in cache - perhaps by the OS, but if not
then by the hard drive itself. So when the power goes out, the data in the
directory (i.e. the associations of the names with the inodes) is lost. All
you are left with is the name of the directory, which presumably is a rather
old association of names with inodes - inodes which no longer point to
anything useful, and so in the fsck you presumably did after the crash these
names get removed.
That's a pretty good guess.
shawn parker wrote:
i have two cases, now, where data went mysteriously "missing" after a
crash or power failure.
crash 1:
server3 hard locked and had to be manually power-cycled. upon
successful startup, all files in /home/sde1 and /arcgis/sde1 were
missing. *not* the directory structure, just the data files in each
directory.
The inodes for /home/sde1 and /arcgis/sde1 were being modified in
memory. These changes were not flushed to disk when you had to power
cycle the system. The data was lost.
Check out /usr/share/doc/kernel-[whatever]/Documentation/sysrq.txt for
info on how to use the sysrq key sequences to sync the disk before doing
that.
power failure 1:
ups had a capacitor blow sending it into bypass. the breaker tripped
causing a full power loss to all server racks (bad luck, i know...also
a 22 hour day last friday). after finally getting everything back on,
the same issue as above happened on a different server; server1.
two different boxes, same problem.
no other directories or users were affected, both times it was
/home/sde1 and /arcgis/sde1. the home directories are local to each
server and the /arcgis partition/mount-point is a different SAN LUN
for each server.
the only thing similar between the two is the naming used.
if "rm" was used, it would need to be recursive to remove all the
files from different directories, but since the directories themselves
are still available, i doubt this is the problem.
Are the files frequently used files? I'd also check for fs corruption
ot this point. Which fs are you using here?
if a combo find+regexp was used, it could be possible, but that would
be a complex command to issue, and only two users would have the
ability to issue it and remove these files, sde1 and root. i know that
i didn't do it.
from a security perspective, it could be said that user sde1 did
something she shouldn't have since /home/sde1 was cleaned out as well
(erasing history, etc) but, since we use a net intelligence logger and
can't see anything "bad" i doubt this is the case, either.
also note, that /arcgis/sde2 and /home/sde2 were fine along with all
other users and SAN LUNs.
anyone have any ideas?
Files in use, open, files not flushed to disk, files disappear, or
possibly directory inode in use/being modified. Same thing. I'd also
strongly suspect FS corruption at this point.
both servers are fully patched red hat enterprise v3 running oracle and
gis.
RHEL 3?! Jeebus. Update.
--dlloyd
_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members