[34396] in Hotline Meeting
Solaris 8.0 Mystery Deepens
daemon@ATHENA.MIT.EDU (Mike Barker)
Fri Aug 16 23:31:10 1996
From: "Mike Barker" <mbarker@MIT.EDU>
To: miki@MIT.EDU, ghudson@MIT.EDU
Cc: mbarker@MIT.EDU, hotline@MIT.EDU, ops@MIT.EDU, jis@MIT.EDU, tom@MIT.EDU
Date: Fri, 16 Aug 1996 23:31:04 EDT
For those who are just joining us - we have a mysterious problem.
the kernel on some workstations seems to be disappearing.
12 private solaris workstations have been found that had apparently
crashed and failed to reboot because /kernel was empty.
miki and greg were investigating a possible connection to the update
process and mkserv. the theory was that perhaps the update didn't complete
and the system was left without a kernel.
tonight Lou called me with the report that m4-167-3 (a public workstation)
was down in the same pattern. it had been down for a couple of days.
when we first tried to work with the system, it complained about the ethernet
connection. we checked the connectivity to the quad box, reseated the cables,
and this problem went away. I'm including this because there have been
reports that network disconnection may have been involved in other incidents.
by using ctrl-a, boot floppy -s, and mount /dev/dsk/c0t3d0s0 /mnt, I looked
at /etc/athena/version. The important information I think is:
7.7V
7.7W
7.7X
7.7Y
update Aug. 14 8:14
8.0I Aug. 14 8:31
(all entries were Workstation, not server - so no mkserv had been run)
ls -ld on the empty /kernel showed Aug. 14, 12:14
I think this indicates that the update completed and the system ran for
nearly 4 hours before losing.
Lou was wondering if someone could be deliberately doing this. Note
that at least some of the private workstations were in locked offices,
so if someone is causing this they are probably using network access
(or lockpicks).
While I can't imagine anyone doing this, could someone check whether
there was an su on this machine at the appropriate time on Wednesday?
I.e., could someone have simply decided to rm /kernel/* for fun and
confusion?
Conclusions:
1. The problem does NOT appear to be confined to private workstations.
mkserv does NOT seem to be involved.
2. Lou checked the reports - we have had both classics and sparc5s.
3. I don't have a reasonable theory as to what is going on.
Any ideas?
thanks
Mike
the hotline report, if it helps
/***** hector:hotline / priam!jmorzins / 5:52 pm Aug 16, 1996*/
From jmorzins@MIT.EDU Fri Aug 16 17:52:20 1996
To: hotline@MIT.EDU
Subject: m4-167-3
Mime-Version: 1.0
Content-Type: text/plain
Date: Fri, 16 Aug 1996 17:52:13 EDT
From: "Jacob Morzinski" <jmorzins@MIT.EDU>
Machine m4-167-3 (Sparc Classic) seems to be seriuosly broken. It
does not boot. When one tries to restart the computer, it
proclaims:
Boot device: /iommu/sbus/espdma@4,8400000/esp@4,8800000/sd@3,0 File and args:
Bad filename.
Enter filename [/kernel/unix]: