| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |
Date: Fri, 15 Oct 2010 16:38:25 -0400 Message-Id: <201010152038.o9FKcPFl013694@distraction.mit.edu> To: acis-team@MIT.EDU, release-team@MIT.EDU From: Jonathon Weiss <jweiss@MIT.EDU> I've sent this out before, but I think there are some new people on these lists since I have. We have a nagios installation that monitors the cluster/dorm/etc workstations and printers. This nagios installation is set up not to send any notifications, but just be monitored via the web. We have a wiki page with links to some of the most interesting nagios pages. https://sowiki.mit.edu/wiki/index.php/Info:Cluster_Nagios On the "Hosts that don't ping" page there are about 60 machines listed. I'm fairly certain that somen of them don't actually exist anymore. If you let me know which ones don't exist, I'll delete (or reserve) them in moira, and they'll fall out of monitoring. The "Hosts that are up but have a serious problem" page lists things that can be software problems, though some of them may be false positives. At least initially, I think this page will be more useful to the release-team folks, than the acis-teams folks. Longer term we may be able to make this more transparent to the acis folks. The "Hosts that are up but have a mild problem" is similar, but with less important messages. -- Jonathon
| home | help | back | first | fref | pref | prev | next | nref | lref | last | post |