[200] in Hotline Meeting
Tonight's building 11 network problems.
daemon@ATHENA.MIT.EDU (Jeffrey I. Schiller)
Sat Apr 28 02:23:03 1990
Date: Sat, 28 Apr 90 02:22:17 -0400
From: Jeffrey I. Schiller <jis@MIT.EDU>
To: hotline@MIT.EDU, ops@MIT.EDU, network@MIT.EDU
Cc: jdb@MIT.EDU, cec@MIT.EDU, orcutt@MIT.EDU, roden@MIT.EDU
Someone in Ocean Engineering misconfigured an Iris workstation
in such a fashion that when they ran "/etc/timed", it sent a packet
that in essence was a "data virus." This packet caused every
workstation, and any other system which uses the "timed" system to
keep track of time, to begin transmitting broadcast packets on the
network as fast as they could!
Ron and I spent about 5 hours tracking down the machines that
were hosing the network and "fix" them. We were unable to find the
machine named "MECHE2." An Athena workstation, it has been moved from
where we last knew its location to be. Luckily we were able to fix it
remotely (by literally breaking the Ethernet!, but that is another
story).
We were unable to stop the runaway workstations in Joe
Ferreira's area on the 5th floor of building 9. None of the machines
there were anywhere where we expected them. As an example a machine we
have in our database as a PC (named "crlpc-10") was in fact the RT in
Rob Smyser's office! For now we has isolated the building 9 segment
from the rest of the network. This implies that none of the
workstation in the area of the fifth floor of building 9 will
function. The folks there will have to find and fix the machines, we
can reconnect the network after that (though frankly I am tempted to
require complete documentation on whats up there, not to mention
network fees for machines we have no records of!).
In summary things were pretty hosed on the building 11 (18.80)
network from about 7:30PM until about 1AM. Things are back to normal
now with the exception of the CRL (Joe's area). I have left a message
on Rob Smyser's home answering machine.
I need to stare at some more source code in the /etc/timed
process to fully understand why the "Ethernet Meltdown" happened. When
I have done this I will probably generate a patch for timed that will
make it immune to this form of lossage in the future. I'll probably
consult with the Berkeley people as well.
-Jeff