[3014] in testers
8.0C: savecore lossage
daemon@ATHENA.MIT.EDU (John Hawkinson)
Sun Jul 21 00:05:44 1996
Date: Sun, 21 Jul 1996 00:05:29 -0400
To: testers@MIT.EDU
From: John Hawkinson <jhawk@MIT.EDU>
Saveing crash dumps doesn't seem to work quite right.
I booted portnoy single user after a crash and editted
/etc/init.d/sysetup to uncomment the crash dump stuff at the end:
##
## Default is to not do a savecore
##
if [ ! -d /var/crash/`uname -n` ]
then mkdir -p /var/crash/`uname -n`
fi
echo 'checking for crash dump...\c '
savecore /var/crash/`uname -n`
echo ''
I then started to boot multiuser. Savecore ran and copied
the dump from the swap area and then produced 3 errors:
savecore: Warning: can't find 'ufs' in module path: /kernel /usr/kernel
savecore: Warning: can't find 'ip' in module path: /kernel /usr/kernel
savecore: Warning: can't find 'udp' in module path: /kernel /usr/kernel
At first I thought this was because those kernel modules were in /afs,
hence my other bug report about loading afs by hand. It turns out
this wasn't true. Actually, they're in /kernel/drv and /kernel/fs.
Once those errors showed up, I immediately hit L1-A and booted single
user again, fscked, and tried to re-run savecore with -v (documented
option for verbosity). This was unsuccessful, as savecore reported
it had already been run.
The BSD savecore has tradionally allowed you to
override this behavior. On the assumption that the Solaris
savecore was the same, I stringsed it:
[portnoy!jhawk] /kernel> strings /bin/savecore | head -2
vdf:
%s: %m
Sure enough, the first returned string looks like a getopt
string. Using -vd caused savecore to ignore the fact
that a dump was already extracted and try again; unfortunately
it overwrote /var/crash/portnoy/vmcore.4. I'm not sure if it
did this because the original savecore did not properly
update the bounds file (perhaps because of the warnings?), or
because -d decides to ignore the bounds file somehow (then why didn't
it write .0?). I would like someone to:
1. Check the Solaris sources and clarify just
what -d does.
2. Complain to Sun (low priority) that -d is not documented.
Perhaps the most effective way would be to send them
a patch to savecore.1m; I'll be happy to write up reasonable
wording if someone else does (1) [doing (1) is hard for me :-)]
Anyhow, further investigation showed that the problem seemed to be that
savecore was seeing reference to kernel modules as "ufs", "ip", and "udp",
and it was searching for those modules in /kernel/drv and /usr/kernel.
The verbose output produed by -vd indicates that most kernel modules
loaded are referenced by relative paths including the subdirs:
# savecore -vd /var/crash/portnoy
System went down at Sat Jul 20 22:53:39 1996
Saving 5103 pages of image in vmcore.0
5103 pages saved.
Modules loaded at the time of crash:
/kernel/unix fs/specfs misc/swapgeneric
sched/TS sched/TS_DPTBL ufs
drv/rootnex drv/options drv/dma
drv/sbus drv/iommu drv/sad
drv/pseudo drv/sd misc/scsi
drv/esp fs/procfs sys/c2audit
misc/strplumb drv/clone ip
drv/tcp udp drv/icmp
So I think the bug here is that somehow those modules were loaded
with a different search path than that used by savecore.
The obvious workaround (which I used) was to make symlinks:
# pwd
/kernel
# ls -l ip udp ufs
lrwxrwxrwx 1 root root 6 Jul 20 23:14 ip -> drv/ip
lrwxrwxrwx 1 root root 7 Jul 20 23:14 udp -> drv/udp
lrwxrwxrwx 1 root root 6 Jul 20 23:14 ufs -> fs/ufs
I think the correct fix is to have the modules loaded with the full
path relative to /kernel. I'm not sure how to accomplish this --
kernel(1m) suggests /etc/system might be pertinent, but this doesn't
seem to actually be the case (though perhaps a better workaround could
be installed there).
The output of "sysdef" seems instructive. The Loadable Objects
section begins:
*
* Loadable Objects
*
unix
ufs
ip
udp
strmod/arp
drv/arp
drv/arp
...
which seems telling; something is wrong with /kernel/unix.
Further staring at Intro(9s) is not very helpful, so
I'll stop here. Hopefully someone can figure this out.
--jhawk