[15041] in Athena Bugs
Re: [Dorothy Bowe : Frame problems on SGIs ]
daemon@ATHENA.MIT.EDU (John Hawkinson)
Mon Mar 31 21:56:59 1997
Date: Mon, 31 Mar 1997 21:56:46 -0500
To: dot@MIT.EDU, bugs@MIT.EDU
Cc: kretch.jdaniel@MIT.EDU, nygren@MIT.EDU, op@MIT.EDU, f_l@MIT.EDU
In-Reply-To: "[8658] in Consulting_FYI"
From: John Hawkinson <jhawk@MIT.EDU>
[insert minor flame about discussing workstation/release issues on op,
but I guess there's a theory that its a license server problem... ]
Twice this evening users from the W20 cluster logged in on SGIs
reported problems wherein starting Frame hung. There was no reason
to believe this was a license problem, as Frame never mapped a window.
I looked at one of these, wherein trace (watchmaker) indicated Frame
kept continually SIGALRM-ing.
Pulling a rabbit out of my hat, I observed that the rpcbind/portmapper
on the machine did not respond to "/usr/etc/rpcinfo -p". Restarting
rpcbind fixed this problem.
In a survey of all the SGI's in the host table in W20, these
machines have nonresponsive rpcbinds, per rpcinfo -p:
18.187.0.122
18.187.0.107
18.187.0.124
18.187.0.129
18.187.0.89
18.187.0.103
18.187.0.126
18.187.0.121
18.187.0.127
18.187.0.86
It's been speculated that this is due to a release bug.
Perhaps someone else feels inclined to go get a stack trace
from these rpcbind processes and disassemble them in time
for tomorrow's release team meeting.
--jhawk
[8658] daemon@ATHENA.MIT.EDU (Dorothy Bowe) Consulting_FYI 03/31/97 11:30 (57 lines)
Subject: [Dorothy Bowe <dot@MIT.EDU> : Frame problems on SGIs ]
To: cfyi@MIT.EDU
Date: Mon, 31 Mar 1997 11:30:48 EST
From: Dorothy Bowe <dot@MIT.EDU>
Here's some more information on debugging the problem.
Dot
------- Forwarded Message
From: Dorothy Bowe <dot@MIT.EDU>
To: kretch@MIT.EDU, jdaniel@MIT.EDU
Cc: ops@MIT.EDU, dot@MIT.EDU
Date: Mon, 31 Mar 1997 11:29:08 -0500
Subject: Frame problems on SGIs
Thanks for the information on the recent FrameMaker problems. I haven't
duplicated it yet myself, but I have seen similar problems in the past.
In those cases the problem was traced to registering RPC program number
300214, but the startup script should be taking care of that now as it
has been.
Here's one thing you can try if you encounter someone with this problem:
maker -nlverbose
When rpc is functioning correctly, the output should look like this:
maker: found FM_FLS_HOST
maker: 1997/03/31-11:24:08 FlcToFlsCheckOut
maker: 1997/03/31-11:24:08 Connecting to FLS on host gooshi
maker: 1997/03/31-11:24:08 realInitFlsConn: start
maker: 1997/03/31-11:24:09 Asking FLS for license
maker: 1997/03/31-11:24:09 destroyFlsConn
maker: Starting FrameMaker 5. Copyright (c) 1986-1995 Frame Technology Corp.
maker: Finished loading
maker: 1997/03/31-11:25:39 NlCheckInLicense
maker: 1997/03/31-11:25:39 FlcToFlsCheckIn
maker: 1997/03/31-11:25:39 Connecting to FLS on host gooshi
maker: 1997/03/31-11:25:40 realInitFlsConn: start
If it hangs right away, rpc is probably the problem. You should be able
to fix it as a user by typing
/usr/etc/rpcinfo -u $host 300214
or as root
/usr/etc/rpcinfo -d 300214
I'll continue to look into the problem.
Dot
------- End of Forwarded Message
--[8658]--