[47927] in North American Network Operators' Group
Network Reliability Engineering
daemon@ATHENA.MIT.EDU (Pete Kruckenberg)
Sat May 18 19:13:35 2002
Date: Sat, 18 May 2002 17:13:02 -0600 (MDT)
From: Pete Kruckenberg <pete@kruckenberg.com>
To: <nanog@merit.edu>
Message-ID: <Pine.LNX.4.33.0205181701090.32373-100000@minot.kruckenberg.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Errors-To: owner-nanog-outgoing@merit.edu
I'm looking for some good reference materials to do some
"reliability engineering" calculations and projections.
This is to justify increased redundancy, and I want to
include quantifiable numbers based on MTBF data and other
reliability factors, kind of a scientific justification
instead of just the typical emotional appeal using
analyst/vendor FUD.
I'd appreciate references on how to do this in a network
environment (what data to collect, how to collect it, how to
analyze, etc). Also any data (or rules of thumb) on typical
MTBFs for network events that I won't find on vendor product
slicks (like what's the MTBF on IOS, or human-caused service
outages of various types, etc).
If someone has put together something remotely like this
that they'd care to share, that'd be incredibly helpful.
Thanks.
Pete.