[176623] in North American Network Operators' Group

home help back first fref pref prev next nref lref last post

Re: A case against vendor-locking optical modules

daemon@ATHENA.MIT.EDU (Chuck Anderson)
Sat Dec 6 08:37:10 2014

X-Original-To: nanog@nanog.org
Date: Sat, 6 Dec 2014 08:37:01 -0500
From: Chuck Anderson <cra@WPI.EDU>
To: nanog@nanog.org
Mail-Followup-To: nanog@nanog.org
In-Reply-To: <20141206095156.GB11032@pob.ytti.fi>
Errors-To: nanog-bounces@nanog.org

On Sat, Dec 06, 2014 at 11:51:56AM +0200, Saku Ytti wrote:
> a) one particular optic had slow i2c, vendor polled it more aggressively than
> it could respond. Vendor polling code didn't handle errors reading from i2c,
> but instead crashed whole linecard control-plane.
> Vendor claimed it's not bug, because it didn't happen on their optic. I tried
> to explain to them, they cannot guarantee that I2C reads won't fail on their
> own optics, and it's serious problem, but was unable to convince them to fix
> it.
> Now I am in possession of good bunch of SFP I can stick to your routers in
> colo, have them crash, and you won't have any clue why they crashed.
> 
> b) particular vendor had bug in their SFP microcontroller where after 2**31
> 1/100 of a seconds had passed, it started to write its uptime to a location
> where DDM temperature measurements are read. This was obvious from graphs,
> because it went linearily from -127 ... 127, then jumped back to -127.
> These optics when seated on Vendor1 caused no problems, when seated on Vendor2
> they caused link flapping, even two boxes away! (A-B-C, A having problematic
> optic, B-C might flap). Coincidentally Vendor2 is same as in case a), they
> didn't consider this was bug in their code.
> This was particularly funny, if you rebooted 100 boxes in a maintenance
> window, then the bug would trigger at same moment after 2**31 1/100th of a
> second, causing potentially major outage.

Who is Vendor2?

home help back first fref pref prev next nref lref last post