[1391] in linux-scsi channel archive
possible memory corruption problem ?
daemon@ATHENA.MIT.EDU (Michel LESPINASSE)
Thu Feb 6 01:38:21 1997
Date: Thu, 6 Feb 1997 07:33:47 +0100 (MET)
From: Michel LESPINASSE <walken@via.ecp.fr>
Reply-To: Michel LESPINASSE <walken@via.ecp.fr>
To: linux-scsi@vger.rutgers.edu
cc: aic7xxx@FreeBSD.org
Hi,
After installing an adaptec AHA-2940UW host adapter in my machine, I
constated that my computer isn't stable under load.
I can predictably crash my computer by using an X-based mpeg player. This
isn't enough to clearly designate the scsi code as the source of this
problem (because X could as well write anywhere in memory) ; however I
cannot reproduce this with my non-scsi kernel. I don't have any easy
explanation, because my scsi disk is NOT used during this mpeg display
test.
The crash seems to be caused by a memory corruption problem. I'm saying
this because fifty percent of the time i'm able to drive my machine to a
state where the copy of libc in my disk cache is corrupted. At this point
any system binary will segfault, but I can still use a statically-linked
shell and internal commands (sash for example). Then I make a disk copy of
the corrupted disk cache version of libc, and after a reboot I can look at
the differences. Sometimes there seems to be a pattern :
# here all bytes are corrupted by a multiple of 8 ! (octal notation)
Studio:/root% cmp -l /lib/libc.so.5.4.20 libc.so.5.4.20.problem
57757 324 314
57853 10 0
58077 204 174
59805 60 50
59901 140 130
60125 320 310
zsh: 1593 exit 1 cmp -l /lib/libc.so.5.4.20 libc.so.5.4.20.problem
# but here I don't notice any particular pattern
Studio:/root% cmp -l /lib/libc.so.5.4.20 libc.so.5.4.20.problem2
1469 110 0
1470 4 0
3517 36 155
3518 6 2
135645 30 4
135646 203 12
137693 0 203
137694 203 304
zsh: 1594 exit 1 cmp -l /lib/libc.so.5.4.20 libc.so.5.4.20.problem2
Another problem I've seen (this time the test directly involves the scsi
driver) :
I've copyed 600 MB of data from /dev/hda1 to /dev/sda1. Then I ran a
diff -r between this two directory trees. The diff goes well for a while,
then my machine enters into another strange state : the disks have very
little activity. they are doing nothing for a second, then there is a scsi
disk access quickly followed by an ide disk access, then nothing for
another second, and the pattern repeats to infinity. The CPU is spending
about 90% of its time in system state during this.
I'm using a 2.0.28 kernel with no fancy scsi compile-time options.
Is this a known problem with the 2940 card, and is there a known
workaround ? Does anyone has a possible explanation ? the strange fact
is that while scsi isn't used in the first test, removing the scsi driver
from the kernel solves the problem.....
Michel "Walken" LESPINASSE - Student at Ecole Centrale Paris (France)
www Email : walken@via.ecp.fr
(o o) VideoLan project : http://videolan.via.ecp.fr/
------oOO--(_)--OOo-------------------------------------------------------
Yow ! 1135 KB/s remote host TCP bandwidth over 10Mb/s ethernet. Beat that!