[161203] in North American Network Operators' Group
Re: Cloudflare is down
daemon@ATHENA.MIT.EDU (Leo Bicknell)
Mon Mar 4 09:51:46 2013
Date: Mon, 4 Mar 2013 06:51:31 -0800
From: Leo Bicknell <bicknell@ufp.org>
To: nanog@nanog.org
Mail-Followup-To: nanog@nanog.org
In-Reply-To: <20130304073113.GA10384@pob.ytti.fi>
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org
--mP3DRpeJDSE+ciuQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
In a message written on Mon, Mar 04, 2013 at 09:31:13AM +0200, Saku Ytti wr=
ote:
> Probably only thing you could have done to plan against this, would have
> been to have solid dual-vendor strategy, to presume that sooner or later,
> software defect will take one vendor completely out. And maybe they did
> plan for it, but decided dual-vendor costs more than the rare outages.
=46rom what I have heard so far there is something else they could
have done, hire higher quality people.
Any competent network admin would have stopped and questioned a
90,000+ byte packet and done more investigation. Competent programmers
writing their internal tools would have flagged that data as out
of rage.
I can't tell you how many times I've sat in a post mortem meeting
about some issue and the answer from senior management is "why don't
you just provide a script to our NOC guys, so the next time they
can run it and make it all better." Of course it's easy to say
that, the smart people have diagnosed the problem!
You can buy these "scripts" for almost any profession. There are
manuals on how to fix everything on a car, and treatment plans for
almost every disease. Yet most people intuitively understand you
take your car to a mechanic and your body to a doctor for the proper
diagnosis. The primary thing you're paying for is expertise in
what to fix, not how to fix it. That takes experience and training.
But somehow it doesn't sink in with networking. I would not at all
be surprised to hear that someone over at Cloudflare right now is
saying "let's make a script to check the packet size" as if that
will fix the problem. It won't. Next time the issue will be
different, and the same undertrained person who missed the packet
size this time will miss the next issue as well. They should all be
sitting around saying, "how can we hire compentent network admins for
our NOC", but that would cost real money.
--=20
Leo Bicknell - bicknell@ufp.org - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/
--mP3DRpeJDSE+ciuQ
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)
iQIVAwUBUTS087N3O8aJIdTMAQLqOA/+IWzSQfSd6m3OUWw/jfbqsi5XYnT+A69E
a9XoX8tl0nxFRMLU6mBIAXHPf/MxhhZaMWAlRWyN2QL+pG08CTufI4fcB/Oy8OKW
wP/7tfQDmKxtlcdTwGClB/NPyHlEIbwmNpCB1CnuWcZmJoIut2ujQnZuUBzpyzJf
PGWdGEETbKGyatCAVNvOYW5egeA1pcF/dxIb3VwhMmsvqqvodTJsjC/vVoeRVR2y
poSeo/LXSntaEUdfEDB0cZDbY8LWHuxWxR97aoSZykFy+ovR4CKcrU8L2zN/F0xZ
ghHXPPjtdbP9hhQfW7UB/9JXHyRHdWOgExSt0rElY2hUpeEXBSuHfZjzsxpb5J2v
5rvfN/yugDmRTCQ4nCvGHDu4t+4hq0FgyJnGPX4EmUNSaQxJR6JXOcluq1rOVNdW
6rZPXB1DBECytNetDJCS48XB2YUlvM2nf2qePUJ2kMayhZtZmRlww9H6BG4iikcm
32rcnKChy+slRGSSZKY2UW5Av9Odu3z6t0ZlnofqkFbCUW068RgKLDsgff9sHhqS
Yb+1ba96WYJGIw0Qmi/JOCR6yO6mcY8ZtYmBKSz4Ws3YmWwCg79h8WrIlA6BVb3A
MOkIYVWmvljF6J4kXx1XiWXpglZl+2LQXV2Ea5HFzM/9GrsPsik8DIjXcVkfmNru
jVme9mLxtB4=
=Y0iP
-----END PGP SIGNATURE-----
--mP3DRpeJDSE+ciuQ--