[15041] in Athena Bugs

home help back first fref pref prev next nref lref last post

Re: [Dorothy Bowe : Frame problems on SGIs ]

daemon@ATHENA.MIT.EDU (John Hawkinson)
Mon Mar 31 21:56:59 1997

Date: Mon, 31 Mar 1997 21:56:46 -0500
To: dot@MIT.EDU, bugs@MIT.EDU
Cc: kretch.jdaniel@MIT.EDU, nygren@MIT.EDU, op@MIT.EDU, f_l@MIT.EDU
In-Reply-To: "[8658] in Consulting_FYI"
From: John Hawkinson <jhawk@MIT.EDU>

[insert minor flame about discussing workstation/release issues on op,
 but I guess there's a theory that its a license server problem... ]

Twice this evening users from the W20 cluster logged in on SGIs
reported problems wherein starting Frame hung. There was no reason
to believe this was a license problem, as Frame never mapped a window.

I looked at one of these, wherein trace (watchmaker) indicated Frame
kept continually SIGALRM-ing.

Pulling a rabbit out of my hat, I observed that the rpcbind/portmapper
on the machine did not respond to "/usr/etc/rpcinfo -p". Restarting
rpcbind fixed this problem.

In a survey of all the SGI's in the host table in W20, these
machines have nonresponsive rpcbinds, per rpcinfo -p:

	18.187.0.122
	18.187.0.107
	18.187.0.124
	18.187.0.129
	18.187.0.89
	18.187.0.103
	18.187.0.126
	18.187.0.121
	18.187.0.127
	18.187.0.86

It's been speculated that this is due to a release bug.
Perhaps someone else feels inclined to go get a stack trace
from these rpcbind processes and disassemble them in time
for tomorrow's release team meeting.

--jhawk

[8658]  daemon@ATHENA.MIT.EDU (Dorothy Bowe) Consulting_FYI 03/31/97 11:30 (57 lines)
Subject: [Dorothy Bowe <dot@MIT.EDU> : Frame problems on SGIs ]
To: cfyi@MIT.EDU
Date: Mon, 31 Mar 1997 11:30:48 EST
From: Dorothy Bowe <dot@MIT.EDU>

Here's some more information on debugging the problem.

		Dot


------- Forwarded Message

From:    Dorothy Bowe <dot@MIT.EDU>
To:      kretch@MIT.EDU, jdaniel@MIT.EDU
Cc:      ops@MIT.EDU, dot@MIT.EDU
Date:    Mon, 31 Mar 1997 11:29:08 -0500
Subject: Frame problems on SGIs


Thanks for the information on the recent FrameMaker problems.  I haven't
duplicated it yet myself, but I have seen similar problems in the past.
In those cases the problem was traced to registering RPC program number
300214, but the startup script should be taking care of that now as it
has been.

Here's one thing you can try if you encounter someone with this problem:

	maker -nlverbose

When rpc is functioning correctly, the output should look like this:

maker: found FM_FLS_HOST
maker: 1997/03/31-11:24:08  FlcToFlsCheckOut
maker: 1997/03/31-11:24:08  Connecting to FLS on host gooshi
maker: 1997/03/31-11:24:08  realInitFlsConn: start
maker: 1997/03/31-11:24:09  Asking FLS for license
maker: 1997/03/31-11:24:09  destroyFlsConn
maker: Starting FrameMaker 5. Copyright (c) 1986-1995 Frame Technology Corp.
maker: Finished loading
maker: 1997/03/31-11:25:39  NlCheckInLicense
maker: 1997/03/31-11:25:39  FlcToFlsCheckIn
maker: 1997/03/31-11:25:39  Connecting to FLS on host gooshi
maker: 1997/03/31-11:25:40  realInitFlsConn: start

If it hangs right away, rpc is probably the problem.  You should be able
to fix it as a user by typing

	/usr/etc/rpcinfo -u $host 300214

or as root

	/usr/etc/rpcinfo -d 300214

I'll continue to look into the problem.

		Dot

------- End of Forwarded Message
--[8658]--

home help back first fref pref prev next nref lref last post