[161215] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Cloudflare is down

daemon@ATHENA.MIT.EDU (Saku Ytti)
Mon Mar 4 13:41:21 2013

Date: Mon, 4 Mar 2013 20:40:58 +0200
From: Saku Ytti <saku@ytti.fi>
To: nanog@nanog.org
In-Reply-To: <CAPWAtbKn2xMUoLOXzdyZ-1qhN6bix_at3LAJ5z8_Lesn-tUjjg@mail.gmail.com>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

On (2013-03-04 13:23 -0500), Jeff Wheeler wrote:
 
> We have lots of stupid people in our industry because so few
> understand "The Way Things Work."

We have tendency to view mistakes we do as unavoidable human errors and
mistakes other people do as avoidable stupidity.

We should actively plan for mistakes/errors, if you actively plan for no
'stupid mistakes', you're gonna have bad time

From my point of view, outages are caused by:
1) operator
2) software defect
3) hardware defect

Most people design only against 3), often with design which actually
increases likelihood of 2) and 1), reducing overall MTBF on design which
strictly theoretically increases it.

-- 
  ++ytti


home help back first fref pref prev next nref lref last post