[33860] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Monitoring highly redundant operations

daemon@ATHENA.MIT.EDU (Simon Lockhart)
Wed Jan 24 20:24:48 2001

To: Sean Donelan <sean@donelan.com>
Cc: nanog@merit.edu
In-Reply-To: Your message of "24 Jan 2001 14:31:20 PST."
             <20010124223120.26272.cpmta@c004.sfo.cp.net> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Wed, 24 Jan 2001 23:43:53 +0000
Message-Id: <5552.980379833@sunf25>
From: Simon Lockhart <simonl@rd.bbc.co.uk>
Errors-To: owner-nanog-outgoing@merit.edu


>But he does raise an interesting problem.  How do you know if your
>highly redudant, diverse, etc system has a problem.  With an ordinary
>system its easy.  It stops working.  In a highly redudant system you
>can start losing critical components, but not be able to tell if
>your operation is in fact seriously compromised, because it continues
>to "work."

Indeed. We currently monitor each part of our operation from a monitoring 
station on our network. Under certain conditions, this can give us both 
false positives and false negatives:

- We've lost off-site routing. Our monitoring station can see all our 
nodes okay, so it thinks everything is fine, but no-one else can see them.

- We've lost routing to just the part of our network with the monitoring 
station on. It reports that everything is down, when in fact stuff is 
working fine for serving the rest of the internet.

One way we plan to overcome these issues is to locate monitoring stations 
on other ISPs networks at random places on the internet. If you correlate 
the results from these multiple monitoring stations, then you get a better 
view of what the rest of the internet is seeing.

Simon
-- 
Simon Lockhart                       |   Tel: +44 (0)1737 839676 
Internet Engineering Manager         |   Fax: +44 (0)1737 839516 
BBC Internet Services                | Email: Simon.Lockhart@bbc.co.uk 
Kingswood Warren,Tadworth,Surrey,UK  |   URL: http://support.bbc.co.uk/




home help back first fref pref prev next nref lref last post