[61752] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Real network failure causes Was: What do you want your ISP to

daemon@ATHENA.MIT.EDU (Ian Mason)
Thu Sep 4 10:24:27 2003

Date: Thu, 04 Sep 2003 14:59:09 +0100
To: Rob Thomas <robt@cymru.com>,
	Johannes Ullrich <jullrich@euclidian.com>
From: Ian Mason <nanog@ian.co.uk>
Cc: NANOG <nanog@merit.edu>
In-Reply-To: <Pine.GSO.4.56.0309031615380.28717@dragon.sauron.net>
Errors-To: owner-nanog-outgoing@merit.edu


At 22:30 03/09/2003, Rob Thomas wrote:
[snip]

>effects.  We all know better.  Bugs aren't restricted only to
>products from Redmond, typos happen, and the performance hit can
>be quite painful.

In my experience more network downtime is caused by configuration errors 
that all other causes together.

The best diagnostic tool I've ever had is a script I cobbled together over 
two hours one night. Once an hour, it simply collected all the router 
configs across the network, did a 'diff' between the current and last 
config, and if there were changes, emailed them to me, along with a TACACS+ 
log summary that showed who had logged into which router when.

Experience with this quickly taught me to check these summary change logs 
whenever a problem was escalated to me. Most times the problem was related 
to a config change, not an external cause. Further experience taught me to 
look out for one particular engineers name in the logs but that's another 
story.



home help back first fref pref prev next nref lref last post