[1427] in linux-net channel archive

home help back first fref pref prev next nref lref last post

Re: Strange behaviour with NFS (fwd)

daemon@ATHENA.MIT.EDU (Robert L Krawitz)
Sun Nov 26 03:44:49 1995

Date: Sat, 25 Nov 1995 20:48:25 -0500
From: Robert L Krawitz <rlk@tiac.net>
to: linux-net@vger.rutgers.edu
In-reply-to: <199511250546.XAA07925@kittpeak.ee.umn.edu>
	(glamm@mountains.ee.umn.edu)


   I guess I can't see how a dead soft NFS mount can lead to file corruption,
   unless the server dies and comes back to life in a fairly short time
   & the program doesn't check the return status of a sequence of writes.

Unfortunately, this is common.  If the server is simply slow, or the
network glitchy, there can easily be NFS timeouts.  Also, many (if not
most) programs never check the status of a write, assuming that it
will either succeed or run out of disk space.  Typically they might
just check the status of the final write.  In this case they might
think that it succeeded when in fact it failed.

Another problem is that a program may do a single big write, and
perhaps one of the NFS transactions will fail (remember that NFS has a
protocol limit of 8K bytes).  In this case, the program will get an
I/O error without knowing how much data actually got written
successfully.  It's possible to recover from this, but it requires
doing an fstat on the file descriptor (and hoping that that doesn't
time out).

It's also rough on programs that are good citizens, because they don't
know if it's an unrecoverable error or if the next write will succeed.

   At the supercomputer institute I used to work at we mounted everything
   soft because dead hard NFS mounts would tend to kill all users on the
   machines, not just a few of them (e.g., having multiple home directories
   mounted via NFS).  I think data loss only occurred maybe once during the
   period of time I was there (18 months) over multiple unscheduled hangs
   of the server machines. ;)

Interruptable NFS mounts are good for this situation, because
operations can be interrupted.

-- 
Robert Krawitz <rlk@tiac.net>           http://www.tiac.net/users/rlk/

Member of the League for Programming Freedom  -- mail lpf@uunet.uu.net
Tall Clubs International  --  tci-request@aptinc.com or 1-800-521-2512

home help back first fref pref prev next nref lref last post