[195334] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Temperature monitoring

daemon@ATHENA.MIT.EDU (Peter Beckman)
Tue Jul 18 22:33:21 2017

X-Original-To: nanog@nanog.org
Date: Tue, 18 Jul 2017 22:33:16 -0400
From: Peter Beckman <beckman@angryox.com>
To: Andrew Latham <lathama@gmail.com>
In-Reply-To: <CA+qj4S-5Vr+6uAAY+86YErnB6gsYm4CCUTGb3HC6i4tJf9vbfw@mail.gmail.com>
Cc: NANOG <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org

Agreed -- there are already tons of temp sensors throughout old and new
hardware. I've used SCSI drive queries via sdparm and more recently hddtemp
to get the current temperature of the drives. No need for SNMP or ILO,
though that can give you a more detailed picture where possible.

You first monitor and record for 24 hours to get your baseline temp for a
given rack or server, then set your threshold, then let your monitoring
platform do the rest.

Since I use hosted dedicated servers, I don't want to pay for yet another
device. In monitoring only those disk temps I've caught two cooling issues
before they became a crisis, one of which my hosting provider was not aware
of.

If you control the hardware, or at least have access to it, there should be
enough sensors to let you know at least something is causing a problem.

Beckman

On Thu, 13 Jul 2017, Andrew Latham wrote:

> On Thu, Jul 13, 2017 at 9:33 PM, Dovid Bender <dovid@telecurve.com> wrote:
>
>> All,
>>
>> We had an issue with a DC where temps were elevated. The one bit of
>> hardware that wasn't watched much was the one that sent out the initial
>> alert. Looking for recommendations on hardware that I can mount/hang in
>> each cabinet that is easy to set up and will alert us if temps go beyond a
>> certain point.
>>
>> TIA.
>>
>> Dovid
>>
>
> Most everything has temperature sensors from switches, servers and most
> modern PDUs. A dedicated solution is just creating the problem again in the
> future. Monitor the temps on everything and gain knowledge related to
> failure rates. Most companies with physical infrastructure could pay for
> another engineer to discover these unexpected expenses. Also note that
> modern air conditioning and refrigeration have SNMP or BACNET protocol
> support, just download the manual.
>
> -- 
> - Andrew "lathama" Latham -
>

---------------------------------------------------------------------------
Peter Beckman                                                  Internet Guy
beckman@angryox.com                                 http://www.angryox.com/
---------------------------------------------------------------------------

home help back first fref pref prev next nref lref last post