[161215] in North American Network Operators' Group
Re: Cloudflare is down
daemon@ATHENA.MIT.EDU (Saku Ytti)
Mon Mar 4 13:41:21 2013
Date: Mon, 4 Mar 2013 20:40:58 +0200
From: Saku Ytti <saku@ytti.fi>
To: nanog@nanog.org
In-Reply-To: <CAPWAtbKn2xMUoLOXzdyZ-1qhN6bix_at3LAJ5z8_Lesn-tUjjg@mail.gmail.com>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
On (2013-03-04 13:23 -0500), Jeff Wheeler wrote:
> We have lots of stupid people in our industry because so few
> understand "The Way Things Work."
We have tendency to view mistakes we do as unavoidable human errors and
mistakes other people do as avoidable stupidity.
We should actively plan for mistakes/errors, if you actively plan for no
'stupid mistakes', you're gonna have bad time
From my point of view, outages are caused by:
1) operator
2) software defect
3) hardware defect
Most people design only against 3), often with design which actually
increases likelihood of 2) and 1), reducing overall MTBF on design which
strictly theoretically increases it.
--
++ytti