[4201] in Athena Bugs
NFS rename(2) bug
daemon@ATHENA.MIT.EDU (Michael A. Fetterman)
Mon Feb 12 14:24:18 1990
To: bugs@ATHENA.MIT.EDU
Return-Receipt-To: mafetter@ATHENA.MIT.EDU
Date: Mon, 12 Feb 90 14:23:26 EST
From: Michael A. Fetterman <mafetter@ATHENA.MIT.EDU>
I was bitten again today by a bug that at least one athena NFS server display
with the rename(2) system call. I was using sortm(1) (ala MH), which
effectively rename's many (or all) pieces of mail in a folder. This bug
confused sortm enough to lose 30+ pieces of my mail.
The bug exhibits itself as follows: a rename(2) call returns -1 (errno = 2, "No
such file or directory"), but the file DID exist before the call, and has now
been SUCCESSFULLY renamed. This sort of false error report explains the
lossage in sortm. (By the way, sortm did report all the errors that were
returned to it by rename, but it didn't guess that the errors weren't real --
why should it have? -- and thus trashed some files.)
This bug started creeping into my life when my home filesystem was first moved
to talos many moon ago. I sometimes noticed it when doing mv's, but mostly as
errors reported by sortm. Generally, I didn't lose any files (either out of
luck or I simply never noticed them go away), and I found that by sorting
again, I'd get the desired end result. For a while, I moved my Mail directory
to another filesytem to avoid the inconvenience. I recently moved it back to
talos, though. Today was the first time I noticed that I had a LOT fewer files
after sortm completed.
The bug is sporadic, but can be pretty easily invoked, and I have written a
demo program which does so. See /mit/mafetter/demo/demo.{vax,rt,mips}. This
program creates a zero length file in the current directory, and then renames
it 1000 times, reporting any errors on stderr. I have yet to run this program
from a directory on talos and get less than 2 errors reported, but it is rarely
above 10 (so roughly 0.5% of the time this bug is invoked). Depending on your
server and client, this program may take upwards of 10 minutes...
You should be able to cd to /mit/mafetter/demo/ to try running the program, as
I have set the permissions to allow such attempts. In this way, you will be
testing the same NFS server I have had troubles with.
rt's, vaxen, and mips NFS clients all exhibit the same behavior, making me
believe it is a server-side problem.
Local & afs filesystems (in my experiments) do not exhibit this behavior.
I had initially suspected that the 3600s were at fault, but after testing, I've
found that I can get errors on talos(a 3600), charon and louiswu(both 750s),
and all the privitized vs2 and vs2000 and rt NFS servers I've tried. The
frequency of the bug varies from platform to platform, though.
Please feel free to cd a directory on your favorite server and run the program
there.
Full source is also available in that directory.
Michael A Fetterman