[755] in resnet

home help back first fref pref prev next nref lref last post

3com switches' "all lights on" mode

daemon@ATHENA.MIT.EDU (Martin Hamilton)
Fri Feb 1 15:04:07 2002

Message-ID:  <E16Wjl9-0003A3-00@gadget.lut.ac.uk>
Date:         Fri, 1 Feb 2002 19:54:07 +0000
Reply-To: Resnet Forum <RESNET-L@listserv.nd.edu>
From: Martin Hamilton <martin@NET.LUT.AC.UK>
To: RESNET-L@listserv.nd.edu

Hi, I posted this to comp.dcom.lans.ethernet, but then it struck me
that folk here might have had very similar experiences...

Any thoughts as to remedies would be greatly appreciated.  Our current
approach has been to replace the duff 3com switches with HP 2524s!

We're a big University campus site, with some 5000 network points in
the student bedrooms, mostly plumbed into 3com 610s, then Cisco 3524s,
with Cisco 3508Gs as the backbone.

Since around the time that Nimda came out (we eventually realised :-),
our 610s have frequently been failing in a very distinctive fashion.
What happens is that a box which has been happily switching away for
ages suddenly stops switching completely, and all of the lights on the
front (24 x 10BaseTx + 2 x 10/100BaseTx) come on.

Sometimes one of the port lights will be blinking, which is apparently
3com-speak for "this is a duff port".  It seems that once the switch
has detected what it thinks is a duff port it may stop switching
completely, presumably in an effort to avoid sending garbage out to
the rest of the network.

Once a switch displays "all lights on", there doesn't seem to be any
way of recovering it.  We took one of the failed switches apart
looking for a "reset" jumper or somesuchlike, but no diggity.

Typically, prior to a switch going into this state we will have seen
it stop switching from time to time - but without "all lights on".

We also found that (even with the latest 2.68 firmware for these
switches), we can instantly cause our switches to lock up by running
Dug Song's "macof" program.  This generates a flood of packets with
random MAC addresses for testing purposes.

By trial and error we discovered that it was usually possible to
"wibble" the next upstream switch and recover the duff 610.  I've
appended my "guide to wibbling" in case it's useful to anyone else.
Note that this also works in the macof case.

My suspicion is that Nimda infections (of which we have had a very
large number due to students being determined not to run virus
checkers) are responsible for flooding the ARP cache of these switches
and confusing some bit of the 3com firmware in the process.

3com's user facing support people fobbed us off with a "we don't make
this product any more", which is fair enough I suppose!

We're about to apply some router ACLs which will block Nimda infection
attempts across VLANs via HTTP and SMB, so fingers crossed on this
one...

Cheers,

Martin


Martin's mini guide to wibbling
===============================

It's possible to spot quite easily when a switch in the hall service
really has failed :-

If it's directly fed by a Cisco, do the following:

  show mac-address-table interface fastethernet 0/1

 ...for a device is attached to interface 1, of course.  If the
resulting table is empty or all entries are "static", the downstream
switch is no longer in the land of the living.

Likewise on a cascaded stack of 3com switches:

  bridge port address list 26

will give you the MAC addresses seen on the stacking link between one
switch and the next.  If the result is:

  No Addresses found for this port

then any subsequent switches in the chain and devices connected to
them are no longer accessible.

Good tactics for recovering switches from the Cisco end are:

  configure terminal
  interface fastethernet 0/1
  shutdown
  end

then wait a moment or two, followed by:

  configure terminal
  interface fastethernet 0/1
  no shutdown
  end

You may also find that changing speed and duplex settings has some
effect, as per:

  configure terminal
  interface fastethernet 0/1
  no speed
  no duplex
  end

then back:

  configure terminal
  interface fastethernet 0/1
  speed 100
  duplex full
  end

In the case of 3com stacks, this is often successful:

  ethernet autonegotiation 26 enable

then

  ethernet autonegotiation 26 disable

However, you may find that you need to restart the 610, as per:

  system reset

And sometime it's possible to recover the duff switch(es) by pinging
the first unresponsive switch in the stack from another switch 610 on
the same VLAN, e.g.

  ip ping A.B.C.D

___________________________________________________
You are subscribed to the ResNet-L mailing list.

To subscribe, unsubscribe or search the archives,
go to http://LISTSERV.ND.EDU/archives/resnet-l.html
___________________________________________________

home help back first fref pref prev next nref lref last post