[19485] in Athena Bugs
Re: sun4 9.0.13: KNFS and nfs.server ownership?
daemon@ATHENA.MIT.EDU (Tom Cavin)
Mon Jul 30 17:43:12 2001
From: Tom Cavin <tec@ai.mit.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <15205.54509.636685.391917@vivaldi.mit.edu>
Date: Mon, 30 Jul 2001 17:43:09 -0400 (EDT)
To: Garry Zacheiss <zacheiss@mit.edu>
Cc: Thomas E Cavin <cavin@mit.edu>, bugs@mit.edu
In-Reply-To: <200107301959.PAA28095@dale.mit.edu>
Hi Garry,
Further info on this matter.
There have been a few problems on Hchinasky and Weepecket that I have
had to solve without going through extensive diagnostics, so I may
well have missed something.
Here is the story from memory -- sorry, no notes.
Hchinasky and Weepecket are private Athena workstations, with attached
RAID systems from RAIDKING, running KNFS. Weepecket serves knfs
lockers nklab-1 through nklab-5, and Hchinasky serves wagner-1 through
wagner-9. Both servers have AUTOUPDATE=true. (This will change, but
hasn't yet. :-) Users from the labs are allowed on the console.
Hchinasky took the update first (last week), but it apparently didn't
have a clean file system when it started, so ended up in maintenance
mode waiting for fsck. When I ran fsck, it cleaned up the offending
file systems on the IDE drive, but didn't even have devices for the
RAID, and thus failed to complete fsck. Eventually, after realizing
that I didn't seem to have emacs or vi, I used sed to generate a new
vfstab file that didn't mention the RAID partitions. Then I rebooted
and the system took the update. After it finished, it had the full
configuration back including the RAID. I didn't check KNFS.
Weepecket took the update over the weekend, and seemed to work,
although there were some garbled reports as of last night. This
morning, the system worked fine from the console but wasn't running
KNFS. I looked around a bit, and ran nfs.server start/stop and was
able to confirm that the problem seemed to be the service not having
been run as opposed to being incorrectly configured. Then I noticed
the ownership of nfs.service with permissions 744. I really didn't
think not having execute permissions would stop root from executing
the file, but I reset the ownership anyway, and rebooted the system.
This time it came up and ran "nfs.server start" without problem.
About 15 minutes later, I got a call that the console screen was
scrolling with a message about syslogd. I logged in remotely and
started an "xterm -C" and sure enough:
message overflow on /dev/log minor #6 -- is syslogd(1M) running?
message overflow on /dev/log minor #6 -- is syslogd(1M) running?
message overflow on /dev/log minor #6 -- is syslogd(1M) running?
I then did a "dmesg" and got:
Jul 30 14:03:09 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 23485
Jul 30 14:03:09 weepecket.mit.edu nfssrv: [ID 785379 kern.notice] NFS server getting AKred for uid 23485
Jul 30 14:03:09 weepecket.mit.edu nfssrv: [ID 719455 kern.notice] NFS server got AKred for uid 23485
weepecket.mit.edu# hesinfo 23485 uid
bjbalas:*:23485:101:Benjamin J Balas,,,,6172251709:/mit/bjbalas:/bin/athena/tcsh
The user in question was trying to get some data from the server to
another lab system -- a legitimate user and operation.
I ran "/etc/init.d/syslogd stop" followed by "/etc/init.d/syslogd
start" and the problem went away.
I also made some slight changes to attach.conf (adding the noac flag
to the NKFS mounts, and turning ownercheck on).
I then went back to Hchinasky and made similar changes to attach.conf,
reset the ownership of the nfs.server file, and ran "nfs.server
start".
As of now, both machines seem to be running normally. Or at least no
users have reported further problems.
I have two other similar unused KNFS systems that I can play with if
we want to try and debug this, but I haven't checked to see if they
exhibit the problem, and I'm not even sure if they have proper Hesiod
entries yet.
Please let me know if there is anything further that I can to to help
with this. It is not an urgent problem for me now, but it could be a
surprise for someone else later if it isn't solved.
Thanks,
--Tom
Garry Zacheiss writes:
> >> As far as I can tell, the problem seems to be the ownership of
> >> the /etc/init.d/nfs.server file.
>
> The problem actually seems to be a lot more subtle than this
> (file ownership shouldn't matter for init scripts, and I verified this
> is the case on a clean 9.0 box). If you look at hchinaski, you'll see
> that your nfsd has been running since the machine was rebooted on 7/26,
> but that mountd only started earlier today. Both nfsd and mountd are
> started from the "nfs.server" script, so it must have run. I'm not
> completely certain why mountd didn't start (missing credentials file is
> common, but not the case here, as mountd would've syslogged about it)
>
> All that said, I chowned the files in the mkserv locker to be
> owned by root, since there's no reason to be installing files owned by
> me; I also made knfs.add chown the things it installs, just to be on the
> safe side.
>
> Garry
--
Tom Cavin Phone: (617) 258 - 7806
WCCF Computer Operations Manager Email: tec@ai.mit.edu