[7611] in Release_7.7_team
cluster-machines without athinfod
daemon@ATHENA.MIT.EDU (Jonathon Weiss)
Wed Aug 17 18:10:20 2011
Date: Wed, 17 Aug 2011 18:10:12 -0400 (EDT)
Message-Id: <201108172210.p7HMACTF014223@speaker-for-the-dead.mit.edu>
To: release-team@MIT.EDU
From: Jonathon Weiss <jweiss@MIT.EDU>
The following cluster/dorm machines are not running athinfod (nagios
reports "CRITICAL - Could not create socket: Connection refused"):
w20-575-80.mit.edu
w20-575-48.mit.edu
mccormick-1.mit.edu
m66-080-19.mit.edu
m66-080-1.mit.edu
m1-115-14.mit.edu
m1-115-1.mit.edu
eos-8.mit.edu
Nagios reports "(Service Check Timed Out)" for these, but the machine pings.
simmons-1.mit.edu
m1-115-21.mit.edu
For either or both groups, does anyone want to investigate these at
all, or should I just request that hotline re-install them?
There are also 66 cluster/dorm machines that are down according to
nagios. Has release-team completed any post-release walk-through that
they were planning? (I'll note that 66 is only a little higher than
average, though the avreage itself feels like it's a lot higher than
it ought to be.)
https://clusters.mit.edu/cluster/cgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=4&hostprops=42
Jonathon