[1694] in Hotline Meeting

home help back first fref pref prev next nref lref last post

m66-070-server

daemon@ATHENA.MIT.EDU (David Krikorian)
Thu Sep 13 00:37:47 1990

Date: Thu, 13 Sep 90 00:37:15 -0400
From: David Krikorian <dkk@ATHENA.MIT.EDU>
To: op@ATHENA.MIT.EDU, hotline@ATHENA.MIT.EDU
Cc: oliver@ATHENA.MIT.EDU, kyrlidis@ATHENA.MIT.EDU, jfc@ATHENA.MIT.EDU
In-Reply-To: [0884] in op
Reply-To: dkk@mit.edu


With the invaluable help of Jim Oliver (oliver@athena) and jfc, I was
able to solve the m66-070-server mystery tonight.

The /site partition on the server was filling up periodically:

------------
tisiphone# rsh m66-070-server 'grep "file system full" /usr/adm/messages'
Aug  5 01:26:12 m66-070-server vmunix: /site: file system full
Sep  1 01:36:34 m66-070-server vmunix: /site: file system full
Sep  2 01:36:29 m66-070-server vmunix: /site: file system full
Sep  3 01:36:42 m66-070-server vmunix: /site: file system full
Sep  4 01:38:08 m66-070-server vmunix: /site: file system full
Sep  5 01:37:51 m66-070-server vmunix: /site: file system full
Sep  6 01:41:15 m66-070-server vmunix: /site: file system full
Sep  7 01:41:02 m66-070-server vmunix: /site: file system full
Sep  8 01:39:55 m66-070-server vmunix: /site: file system full
Sep  9 01:40:47 m66-070-server vmunix: /site: file system full
Sep 10 01:40:07 m66-070-server vmunix: /site: file system full
Sep 11 01:44:37 m66-070-server vmunix: /site: file system full
Sep 12 01:43:47 m66-070-server vmunix: /site: file system full

------------
Do those times look familiar?  They should.  That's during the Moira
update.  To be more specific, that's exactly once during every Moira
update since the beginning of the month.  Comparing /usr/etc/cred* on
m66-070-server and Cezanne (another RT NFS server), showed that
credentials.pag (the large binary database file used for NFS mappings
from a client) was only about 90% its expected size on m66-070-server.
I deleted a couple of old, useless copies of credentials.pag, rebuilt
the database with the mkcred command, and restarted rpc.mountd.
Service is back to its normal flawless state.

Lessons:

 - It's nearly essential for a user to demonstrate to us interactively
   what is failing and how it's failing.  We already knew that, but I
   just thought I'd point it out again.  If I didn't have Jim to work
   with, I wouldn't have known where to start looking for problems,
   because there are SO MANY possibilities.

	*** OPS READ THIS ***

 - An RT NFS fileserver MUST have 5meg free on its /site partition for
   the Moira credentials update to succeed.  A VAX NFS fileserver only
   needs about 2meg.  Since we need to leave at least 10meg or 15meg
   for the rest of the /site partition, that means the bare minimum
   size for an RT NFS fileserver /site partition is about 15meg, and
   the preferred size should be at least 25meg.  (We shouldn't deploy
   anything with less.)  The server in question had a /site paritition
   of 28meg.


home help back first fref pref prev next nref lref last post