[200] in Hotline Meeting

home help back first fref pref prev next nref lref last post

Tonight's building 11 network problems.

daemon@ATHENA.MIT.EDU (Jeffrey I. Schiller)
Sat Apr 28 02:23:03 1990

Date: Sat, 28 Apr 90 02:22:17 -0400
From: Jeffrey I. Schiller <jis@MIT.EDU>
To: hotline@MIT.EDU, ops@MIT.EDU, network@MIT.EDU
Cc: jdb@MIT.EDU, cec@MIT.EDU, orcutt@MIT.EDU, roden@MIT.EDU

	Someone in Ocean Engineering misconfigured an Iris workstation
in such  a fashion that when  they ran "/etc/timed", it  sent a packet
that in   essence   was a "data   virus."   This packet  caused  every
workstation, and any other  system  which uses the "timed"  system  to
keep track  of time,  to begin transmitting  broadcast packets  on the
network as fast as they could!

	Ron and I spent about 5  hours tracking down the machines that
were hosing the network  and  "fix" them. We were  unable to find  the
machine named "MECHE2." An Athena workstation, it  has been moved from
where we last knew its location to be. Luckily we were  able to fix it
remotely (by literally   breaking the Ethernet!,  but that  is another
story).

	We   were  unable  to  stop  the runaway  workstations  in Joe
Ferreira's area on the 5th  floor of building 9.  None of the machines
there were anywhere where we expected them. As an example a machine we
have in our database as a PC (named "crlpc-10") was  in fact the RT in
Rob Smyser's office!  For now we  has isolated the  building 9 segment
from the  rest  of  the  network.  This implies   that none    of  the
workstation in the   area  of  the   fifth floor of  building  9  will
function. The folks  there will have to  find and fix the machines, we
can reconnect the  network after that  (though frankly I am tempted to
require complete  documentation on  whats   up  there, not  to mention
network fees for machines we have no records of!).

	In summary things were pretty hosed on the building 11 (18.80)
network from about 7:30PM until  about 1AM. Things  are back to normal
now with the exception of the CRL (Joe's area). I  have left a message
on Rob Smyser's home answering machine.

	I need to  stare at some  more source  code in the  /etc/timed
process to fully understand why the "Ethernet Meltdown" happened. When
I have done this I will probably generate a  patch for timed that will
make it immune  to this form of  lossage in the future. I'll  probably
consult with the Berkeley people as well.

			-Jeff

home help back first fref pref prev next nref lref last post