[87008] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: Akamai server reliability

daemon@ATHENA.MIT.EDU (Vinny Abello)
Mon Nov 28 14:02:53 2005

Date: Mon, 28 Nov 2005 14:02:17 -0500
To: Roy <garlic@garlic.com>
From: Vinny Abello <vinny@tellurian.com>
Cc: nanog@merit.edu
In-Reply-To: <438B4EF7.7090704@garlic.com>
Errors-To: owner-nanog@merit.edu


At 01:39 PM 11/28/2005, Roy wrote:

>Hi,
>
>Many moons ago, we got a set of Akamai servers.  Over the years I 
>think they replaced every one of them at least once.  Last August we 
>got a another set of servers due to a move and now two of those 
>three servers have failed.
>I still have the original server that started garlic.com in 
>production after 11+ years so I know servers can last a long 
>time.  I don't understand why Akamai failure rates are so high
>
>Is anyone else seeing high failure rates of Akamai servers at their 
>facilities?

Out of the total three Akamai servers we have, I think we've had two 
of them replaced in the past three or four years that we've had them. 
One was replaced several times. The replacement servers tend to be 
refurbished and I've seen multiple things wrong with them when they 
arrive. If I recall correctly, one replacement wouldn't even boot 
successfully... Just kept crashing. Reloading the OS from an Akamai 
recovery CD had no affect. Shipping does cause problems whereby the 
parts can come loose during transit.

The most common problem we see is failed hard drives and/or SCSI bus 
errors which are likely related to the hard drive failures. I'm 
surprised Akamai doesn't have any hardware RAID with hot swap yet (at 
least not in the boxes we have). It would be much less costly for 
them to ship a new hard drive than a whole new server each time a 
hard drive fails. I know the idea is to have very cheap boxes in 
clusters, but I wonder how much they're paying in shipping for 
replacing the cheap hardware.

As of late, we've had no known problems with our Akamai boxes. That 
one box does occasionally have weird SCSI hangs where the other two 
work nonstop. For the most part it is fine though.



Vinny Abello
Network Engineer
Server Management
vinny@tellurian.com
(973)300-9211 x 125
(973)940-6125 (Direct)
PGP Key Fingerprint: 3BC5 9A48 FC78 03D3 82E0  E935 5325 FBCB 0100 977A

Tellurian Networks - The Ultimate Internet Connection
http://www.tellurian.com (888)TELLURIAN

"Courage is resistance to fear, mastery of fear - not absence of 
fear" -- Mark Twain


home help back first fref pref prev next nref lref last post