MLUG: Re: [MLUG] "missing" files after hard crash or power failure
Re: [MLUG] "missing" files after hard crash or power failure
Email address obfuscation in effect -- please click here to turn it off.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Here is my guess. /home/sde1 are updated a lot. So the directories (which consists of a data file containing the names of the files in it, along with their associated inodes) is kept in cache - perhaps by the OS, but if not then by the hard drive itself. So when the power goes out, the data in the directory (i.e. the associations of the names with the inodes) is lost. All you are left with is the name of the directory, which presumably is a rather old association of names with inodes - inodes which no longer point to anything useful, and so in the fsck you presumably did after the crash these names get removed.


shawn parker wrote:
i have two cases, now, where data went mysteriously "missing" after a
crash or power failure.

crash 1:

server3 hard locked and had to be manually power-cycled. upon
successful startup, all files in /home/sde1 and /arcgis/sde1 were
missing. *not* the directory structure, just the data files in each
directory.

power failure 1:

ups had a capacitor blow sending it into bypass. the breaker tripped
causing a full power loss to all server racks (bad luck, i know...also
a 22 hour day last friday). after finally getting everything back on,
the same issue as above happened on a different server; server1.

two different boxes, same problem.

no other directories or users were affected, both times it was
/home/sde1 and /arcgis/sde1. the home directories are local to each
server and the /arcgis partition/mount-point is a different SAN LUN
for each server.

the only thing similar between the two is the naming used.

if "rm" was used, it would need to be recursive to remove all the
files from different directories, but since the directories themselves
are still available, i doubt this is the problem.

if a combo find+regexp was used, it could be possible, but that would
be a complex command to issue, and only two users would have the
ability to issue it and remove these files, sde1 and root. i know that
i didn't do it.

from a security perspective, it could be said that user sde1 did
something she shouldn't have since /home/sde1 was cleaned out as well
(erasing history, etc) but, since we use a net intelligence logger and
can't see anything "bad" i doubt this is the case, either.

also note, that /arcgis/sde2 and /home/sde2 were fine along with all
other users and SAN LUNs.

anyone have any ideas?

both servers are fully patched red hat enterprise v3 running oracle and gis.

the first time it was a test environment, no harm no foul. the second
time, though, it was production.

the issue is resolved, i simply restored the data from tape. but, i
would like to know why it happened in the first place.



_______________________________________________
members mailing list
EMAIL:PROTECTED
http://mlug.missouri.edu/mailman/listinfo/members