[129982] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Facebook Engineering on today's outage

daemon@ATHENA.MIT.EDU (Chris Woodfield)
Fri Sep 24 16:43:47 2010

From: Chris Woodfield <rekoil@semihuman.com>
In-Reply-To: <3355550.3590.1285294632830.JavaMail.root@benjamin.baylink.com>
Date: Fri, 24 Sep 2010 13:43:23 -0700
To: Jay R. Ashworth <jra@baylink.com>
Cc: outages@outages.org, nanog@nanog.org
Errors-To: nanog-bounces+nanog.discuss=bloom-picayune.mit.edu@nanog.org

Agreed; my reading of this suggests database caching issues (i.e. all =
the frontend/middleware clients hitting the main sql cluster at once =
instead of the memcached farm they normally use), not HTTP/CDN caching =
issues.

-C

On Sep 23, 2010, at 7:17 12PM, Jay R. Ashworth wrote:

> =
http://www.facebook.com/notes/facebook-engineering/more-details-on-todays-=
outage/431441338919
>=20
> Apparently, our surmise about Akamai notwithstanding, the problem was =
actually
> internal to their app-specific caching facilities, which went into =
Sorcerer's
> Apprentice mode, and they had to kill them all and let ghod sort them =
out.
>=20
> More if I get it; hope that posting's public.=20
>=20
> Cheers,
> -- jra
>=20
> --=20
> Jay R. Ashworth                   Baylink                      =
jra@baylink.com
> Designer                     The Things I Think                       =
RFC 2100
> Ashworth & Associates     http://baylink.pitas.com                     =
'87 e24
> St Petersburg FL USA      http://photo.imageinc.us             +1 727 =
647 1274
>=20
>    Start a man a fire, and he'll be warm all night.
>     Set a man on fire, and he'll be warm for the rest of his life.
>=20



home help back first fref pref prev next nref lref last post