[7611] in Release_7.7_team

home help back first fref pref prev next nref lref last post

cluster-machines without athinfod

daemon@ATHENA.MIT.EDU (Jonathon Weiss)
Wed Aug 17 18:10:20 2011

Date: Wed, 17 Aug 2011 18:10:12 -0400 (EDT)
Message-Id: <201108172210.p7HMACTF014223@speaker-for-the-dead.mit.edu>
To: release-team@MIT.EDU
From: Jonathon Weiss <jweiss@MIT.EDU>


The following cluster/dorm machines are not running athinfod (nagios
reports "CRITICAL - Could not create socket: Connection refused"):

w20-575-80.mit.edu
w20-575-48.mit.edu
mccormick-1.mit.edu
m66-080-19.mit.edu
m66-080-1.mit.edu
m1-115-14.mit.edu
m1-115-1.mit.edu
eos-8.mit.edu

Nagios reports "(Service Check Timed Out)" for these, but the machine pings. 
simmons-1.mit.edu
m1-115-21.mit.edu

For either or both groups, does anyone want to investigate these at
all, or should I just request that hotline re-install them?


There are also 66 cluster/dorm machines that are down according to
nagios.  Has release-team completed any post-release walk-through that
they were planning?  (I'll note that 66 is only a little higher than
average, though the avreage itself feels like it's a lot higher than
it ought to be.)

https://clusters.mit.edu/cluster/cgi-bin/status.cgi?hostgroup=all&style=hostdetail&hoststatustypes=4&hostprops=42

	Jonathon

home help back first fref pref prev next nref lref last post