[7810] in linux-scsi channel archive
Re: Scuffed up CDROM locks up my SMP system. AHA154x module
daemon@ATHENA.MIT.EDU (Eric Youngdale)
Sun Jan 9 14:44:51 2000
Date: Sun, 9 Jan 2000 13:59:37 -0500 (EST)
From: Eric Youngdale <eric@andante.org>
To: Brynn Rogers <brynn@tiny.net>
Cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <38780ECD.7D11EEF8@tiny.net>
Message-ID: <Pine.LNX.4.10.10001091337580.16899-200000@gwyn.tux.org>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="1624180858-163969540-947444377=:16899"
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.
--1624180858-163969540-947444377=:16899
Content-Type: TEXT/PLAIN; charset=US-ASCII
On Sat, 8 Jan 2000, Brynn Rogers wrote:
> I have a Quad Pentium Pro running a buslogic BT-956C and a Adaptec 1542B
> (as a module). The aha1542 has only an external CD jukebox with a
> Toshiba XM-3401TA in it. After I mount the bad disk (which works
> fine), when I try to do an ls -R on that disk
> my system will lock up hard, and the cpu activity lights will show
> only one CPU active. I am running the redhat 6.0 kernel (2.2.5-15smp).
> Is there some bug in the driver that lets this scratched up CDROM lock
> the system?
Obviously the answer is yes, but offhand I don't know what it
might be. I went through and found several bugs in the 2.3 series when I
was running tests of my own using a scuffed up CDROM. Ultimately I got
it to the point where there were no bad side-effects at all, once I
cleaned up some of the problems I identified.
Suffice it to say that the code in 2.2 and in 2.3 is radically
different, and bugs that I found in 2.3 might not apply. Many of the bugs
that I found were new to the new queueing code in 2.3, for example. That
being said, it shouldn't be *that* hard to bring the 2.2 kernel up to the
same level of reliability. FWIW I was also using a 1542.
I would like it if you could take the lead in trying to figure out
what is wrong, as I am kind of tied up with 2.3 related things. If need
be, I could build and test such a kernel, but I would rather not at the
moment. For one thing, my test machine is uniprocessor (although I tend
to build with SMP turned on, as this is better at catching bugs). My
dual-processor machine doesn't have the 1542. I will give you a couple of
pointers that I used when doing this stuff:
1) On my system, e2fsck on my partitions was painfully slow. In
cases where I suspected a possible system crash, I basically first brought
the system down to single-user mode. This is done with:
/sbin/telinit 1
One of the bugs I found in the 2.3 series kernel was that the error
handler thread shut down in such cases. I believe that this bug still
exists in the 2.2 series kernels. Without an error handler thread,
anyone attempting to access the cdrom will be blocked forever. The
patches that I am enclosing correct this situation.
2) Unmount all but /, and remount / as readonly. This protects
your / partition from damage, and speeds up the reboot.
3) Turn on logging. Usually something like:
echo "scsi log all" > /proc/scsi/scsi
will work, but this only works if the kernel was built with logging
enabled. If logging isn't enabled, then you would need to rebuild a
kernel with logging enabled.
4) Torture the cdrom. You will get tons of messages on the
console indicating what is going on, and what the thing is trying to do.
At this point, note down everything left on the screen. Next, try
switching VC to see if this works. Also, try the Shift-scroll lock to see
if the system prints anything out. THese two things are important as
they would give us information about what might be going on. In
particular, if somebody forgot to release a lock, it is easy to imagine
that the whole system will get badly wedged. If we ended up in an
interrupt handler trying to grab a held lock, then the Shift-scroll lock
won't work.
> I have tried to upgrade my kernel to 2.2.14 which works fine, BUT the
> new kernel does not bring up my RAID-5 array of disks, I always have to
> go back to the stock redhat kernel. Anyone have any Idea what is up
> there?
No idea here.
-Eric
--1624180858-163969540-947444377=:16899
Content-Type: TEXT/PLAIN; charset=US-ASCII; name=kdiff
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.10.10001091359370.16899@gwyn.tux.org>
Content-Description: Bugfixes against 2.2 kernel
Content-Disposition: attachment; filename=kdiff
LS0tIC4vZHJpdmVycy9zY3NpL3Njc2kuYy5+MX4JVHVlIEphbiAgNCAxMzox
MjoyMSAyMDAwDQorKysgLi9kcml2ZXJzL3Njc2kvc2NzaS5jCVN1biBKYW4g
IDkgMTM6NDc6MjUgMjAwMA0KQEAgLTMwNDAsNyArMzA0MCw3IEBADQogICAg
ICAgICAgICAgc3RydWN0IHNlbWFwaG9yZSBzZW0gPSBNVVRFWF9MT0NLRUQ7
DQogICAgICAgICAgICAgDQogICAgICAgICAgICAgc2hwbnQtPmVoX25vdGlm
eSA9ICZzZW07DQotICAgICAgICAgICAgc2VuZF9zaWcoU0lHS0lMTCwgc2hw
bnQtPmVoYW5kbGVyLCAxKTsNCisgICAgICAgICAgICBzZW5kX3NpZyhTSUdI
VVAsIHNocG50LT5laGFuZGxlciwgMSk7DQogICAgICAgICAgICAgZG93bigm
c2VtKTsNCiAgICAgICAgICAgICBzaHBudC0+ZWhfbm90aWZ5ID0gTlVMTDsN
CiAgICAgICAgIH0NCi0tLSAuL2RyaXZlcnMvc2NzaS9zY3NpX2Vycm9yLmMu
fjF+CU1vbiBBdWcgIDkgMTU6MDQ6NDAgMTk5OQ0KKysrIC4vZHJpdmVycy9z
Y3NpL3Njc2lfZXJyb3IuYwlTdW4gSmFuICA5IDEzOjQ5OjI1IDIwMDANCkBA
IC0zNSw3ICszNSwxOSBAQA0KICNpbmNsdWRlICJob3N0cy5oIg0KICNpbmNs
dWRlICJjb25zdGFudHMuaCINCiANCi0jZGVmaW5lIFNIVVRET1dOX1NJR1MJ
KHNpZ21hc2soU0lHS0lMTCl8c2lnbWFzayhTSUdJTlQpfHNpZ21hc2soU0lH
VEVSTSkpDQorLyoNCisgKiBXZSBtdXN0IGFsd2F5cyBhbGxvdyBTSFVURE9X
Tl9TSUdTLiAgRXZlbiBpZiB3ZSBhcmUgbm90IGEgbW9kdWxlLA0KKyAqIHRo
ZSBob3N0IGRyaXZlcnMgdGhhdCB3ZSBhcmUgdXNpbmcgbWF5IGJlIGxvYWRl
ZCBhcyBtb2R1bGVzLCBhbmQNCisgKiB3aGVuIHdlIHVubG9hZCB0aGVzZSwg
IHdlIG5lZWQgdG8gZW5zdXJlIHRoYXQgdGhlIGVycm9yIGhhbmRsZXIgdGhy
ZWFkDQorICogY2FuIGJlIHNodXQgZG93bi4NCisgKg0KKyAqIE5vdGUgLSB3
aGVuIHdlIHVubG9hZCBhIG1vZHVsZSwgd2Ugc2VuZCBhIFNJR0hVUC4gIFdl
IG11c3RuJ3QNCisgKiBlbmFibGUgU0lHVEVSTSwgYXMgdGhpcyBpcyBob3cg
dGhlIGluaXQgc2h1dHMgdGhpbmdzIGRvd24gd2hlbiB5b3UNCisgKiBnbyB0
byBzaW5nbGUtdXNlciBtb2RlLiAgRm9yIHRoYXQgbWF0dGVyLCBpbml0IGFs
c28gc2VuZHMgU0lHS0lMTCwNCisgKiBzbyB3ZSBtdXN0bid0IGVuYWJsZSB0
aGF0IG9uZSBlaXRoZXIuICBXZSB1c2UgU0lHSFVQIGluc3RlYWQuICBPdGhl
cg0KKyAqIG9wdGlvbnMgd291bGQgYmUgU0lHUFdSLCBJIHN1cHBvc2UuDQor
ICovDQorI2RlZmluZSBTSFVURE9XTl9TSUdTCShzaWdtYXNrKFNJR0hVUCkp
DQogDQogI2lmZGVmIERFQlVHDQogICAgICNkZWZpbmUgU0VOU0VfVElNRU9V
VCBTQ1NJX1RJTUVPVVQNCkBAIC0xMDc0LDcgKzEwODYsMTAgQEANCiAgICAg
fQ0KICAgZWxzZQ0KICAgICB7DQotICAgICAgcmV0dXJuIEZBSUxFRDsNCisg
ICAgICAgICAgICAvKg0KKyAgICAgICAgICAgICAqIE5vIG1vcmUgcmV0cmll
cyAtIHJlcG9ydCB0aGlzIG9uZSBiYWNrIHRvIHVwcGVyIGxldmVsLg0KKyAg
ICAgICAgICAgICAqLw0KKyAgICAgICAgICAgIHJldHVybiBTVUNDRVNTOw0K
ICAgICB9DQogfQ0KIA0KQEAgLTE5NDcsNyArMTk2Miw5IEBADQogCWN1cnJl
bnQtPmZzID0gZnM7DQogCWF0b21pY19pbmMoJmZzLT5jb3VudCk7DQogDQot
CXNpZ2luaXRzZXRpbnYoJmN1cnJlbnQtPmJsb2NrZWQsIFNIVVRET1dOX1NJ
R1MpOw0KKyAgICAgICAgaWYoIGhvc3QtPmxvYWRlZF9hc19tb2R1bGUgKSB7
DQorICAgICAgICAgICAgICAgIHNpZ2luaXRzZXRpbnYoJmN1cnJlbnQtPmJs
b2NrZWQsIFNIVVRET1dOX1NJR1MpOw0KKyAgICAgICAgfQ0KIA0KIA0KIAkv
Kg0KQEAgLTE5NzUsMTAgKzE5OTIsMTQgQEANCiAJICAgICAqIHRyeWluZyB0
byB1bmxvYWQgYSBtb2R1bGUuDQogCSAgICAgKi8NCiAgICAgICAgICAgICBT
Q1NJX0xPR19FUlJPUl9SRUNPVkVSWSgxLHByaW50aygiRXJyb3IgaGFuZGxl
ciBzbGVlcGluZ1xuIikpOw0KLQkgICAgZG93bl9pbnRlcnJ1cHRpYmxlICgm
c2VtKTsNCi0NCi0JICAgIGlmIChzaWduYWxfcGVuZGluZyhjdXJyZW50KSAp
DQotCSAgICAgIGJyZWFrOw0KKyAgICAgICAgICAgIGlmKCBob3N0LT5sb2Fk
ZWRfYXNfbW9kdWxlICkgew0KKyAgICAgICAgICAgICAgICAgICAgZG93bl9p
bnRlcnJ1cHRpYmxlKCZzZW0pOw0KKyAgICAgICAgICAgICAgICAgICAgDQor
ICAgICAgICAgICAgICAgICAgICBpZiAoc2lnbmFsX3BlbmRpbmcoY3VycmVu
dCkpDQorICAgICAgICAgICAgICAgICAgICAgICAgICAgIGJyZWFrOw0KKyAg
ICAgICAgICAgIH0gZWxzZSB7DQorICAgICAgICAgICAgICAgICAgICBkb3du
KCZzZW0pOw0KKyAgICAgICAgICAgIH0NCiANCiAgICAgICAgICAgICBTQ1NJ
X0xPR19FUlJPUl9SRUNPVkVSWSgxLHByaW50aygiRXJyb3IgaGFuZGxlciB3
YWtpbmcgdXBcbiIpKTsNCiANCg==
--1624180858-163969540-947444377=:16899--
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu