[19503] in Athena Bugs

home help back first fref pref prev next nref lref last post

sun4 9.0.13: KNFS and syslogd

daemon@ATHENA.MIT.EDU (0000-Admin(0000))
Tue Jul 31 18:18:03 2001

Message-Id: <200107312218.SAA08484@weepecket.mit.edu>
To: bugs@MIT.EDU
Cc: cavin@mit.deu
Date: Tue, 31 Jul 2001 18:18:00 -0400
From: 0000-Admin(0000) <root@weepecket.mit.edu>

System name:		weepecket.mit.edu
Type and version:	Ultra-5_10 9.0.13 (with mkserv)
Display type:		ffb

Shell:			/bin/athena/tcsh
Window manager:		unknown

What were you trying to do?

	Use a private Athena workstation as a KNFS server, and access
	files from a second private Athena workstation.

What's wrong:

       A "normal" user operation on Cuttyhunk (Athena 8.4.25, and not
       taking an upgrade for some reason -- not yet checked) is trying
       to do I/O on Weepecket.  The job does a lot of I/O (many MB,
       possibly upto a GB in files that are about a MB each), and
       seems to be running normally, but the file server, Weepecket,
       is having problems.

       Running "dmesg" on Weepecket gives about 200 of these messages
       and seems to have filled up a buffer or pipe.  Syslogd is also
       taking a lot of CPU cycles.

  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 30559 
  Jul 31 17:33:05 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 30559 

	Other problems involve users being unable to logout (the
	system hangs and the only way to clear them was to kill all
	their processes, then send a HUP to the dm process), and
	general unresponsiveness.

	At some point earlier in the day, messages were being written
	to the console, interfering with a user session on the
	console, that was spewing a message about something being full
	and wondering if syslogd was still running (it was).
	Restarting syslogd cleared the messages from the console, but
	also rendered "xterm -C" unable to connect to /dev/console.

What should have happened:

	As far as I know, there shouldn't be any errors or warnings
	from these operations.

Please describe any relevant documentation references:

       Some of this may be the result of too many users on the system
       at one time.  One user in particular started 3 jobs at the same
       time (an scp and 2 bzip2, all three of which were on GB sized
       files) and on the same disk.  (This lab is working on getting
       more machines, but right now they have 10 people, one Ultra 10
       as a file server, and very large jobs.)

       The real question is whether this is normal behavior for this
       configuration under extreme load, or if this is some more
       general problem with the Athena 9.0 release or the interaction
       between 9.0 and 8.4.

       I'd like to get the lab to reboot the system, but I'd also like
       to have some idea whether it will possibly make a difference.

Thanks,

	--Tom

home help back first fref pref prev next nref lref last post