[105541] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: OS, Hardware, Network - Logging, Monitoring, and Alerting

daemon@ATHENA.MIT.EDU (Paul Armstrong)
Fri Jun 27 01:34:30 2008

Date: Fri, 27 Jun 2008 05:34:18 +0000
From: Paul Armstrong <psa@otoh.org>
To: "Rev. Jeffrey Paul" <sneak@datavibe.net>
In-Reply-To: <20080626092204.GR24243@datavibe.net>
Cc: nanog@nanog.org
Errors-To: nanog-bounces@nanog.org

At 2008-06-26T02:22-0700, Rev. Jeffrey Paul wrote:
> Other stuff we really need to keep an eye on is hardware - redundant
> PSU status in our 7204s and Dells, temperatures and voltages 

Do yourself a favor, monitor temp in C. Most stuff only does C, people
burn routers if there's a mix of C and F (I set the alarm to 90, why
didn't it shut down? Well, you should have set it to 30, the router only
understands C).

> 1) Is SNMP the best way to do this?  Obviously some of the data (service
> checks) will need to be collected other ways.
 
Pretty much.
Particularly with NetSNMP, you can hook in external commands etc.

Check out
http://www.net-snmp.org/docs/man/snmpd.conf.html
Arbitrary Extension Commands

If you don't use SNMP for everything, you're going to be stuck with
hooking SNMP into whatever you do use so that all your networking kit
and environmental monitors can be monitored.

> 2) Is there any good solution that does both logging/trending of this
> data and also notification/monitoring/alerting?  I've used both Nagios
> and Cacti in the past, and, due to the number of individual things being
> monitored (3-5 items per OS instance, 5-10 items per physical server,
> 10-50 things per network device), setting them both up independently
> seems like a huge pain.  Also, I've never really liked Nagios that much.

Take a look at OpenNMS....

> There's got to be a better way.  What do you guys use?
 
We wrote our own, but that's a company culture thing.

Paul
 
-- 
End dual-measurement, let's finish going metric!
http://gometric.us/
http://www.metric.org/


home help back first fref pref prev next nref lref last post