[4524] in linux-scsi channel archive
AIC7XXX timeout problems on Alpha...
daemon@ATHENA.MIT.EDU (Matti Aarnio)
Tue Aug 11 10:53:18 1998
To: dledford@dialnet.net
Date: Tue, 11 Aug 1998 16:42:03 +0300 (EEST)
Cc: linux-scsi@vger.rutgers.edu
From: Matti Aarnio <matti.aarnio@sonera.fi>
Howdy,
I have been ambivalent whether or not the problem has been in AIC7XXX
driver, or in RAID(0) subsystem. (Or Linux SCSI system in overall..)
Now I am convinced that it is in the AIC7XXX.
I see these same errors in the same system also without RAID being
in use at all.,. In fact I saw them even before I added my two new
9GB Quantum disks into a RAID0 stripes, although rarely.
With a high disk activity I have increased propability of encountering
the error condition. If I have a '(while sync;do sleep 1;done)&' loop
going on in background, then it is a bit less likely to occur than
without it, while I am hammering the system with some massive software
compilation process. (e.g. compiling X on my Alpha..)
Right now the version of AIC7XXX driver in use is the one in kernel
2.1.115.
I recall having tried also aic7xxx-5.1.0-pre5-2.1.109.patch.gz
version, and still encountering the same problem.
My suspicion is that amongst the 10 000 lines of source the AIC7XXX
driver does something dangerous with pointer chains without having
proper spinlock/whatever in place outside the interrupt context.
It could be more general SCSI-layer problem as well, although that
is another magnitude of more complexity of which nobody seem to be
well versed with...
Below are a lot more details, and some comments in between cut&paste
passages..
/Matti Aarnio <matti.aarnio@sonera.fi>
My disks are:
-----------------------------------------------------------
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: DEC Model: RZ1BB-BS (C) DEC Rev: 0658
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: DEC Model: RZ1CC-BA (C) DEC Rev: 880F
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
Vendor: QUANTUM Model: VIKING II 9.1WLS Rev: 4110
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
Vendor: QUANTUM Model: VIKING II 9.1WLS Rev: 4110
Type: Direct-Access ANSI SCSI revision: 02
[root@mea /root]#
-----------------------------------------------------------
( from "cat /proc/scsi/aic7xxx/0" command output )
-----------------------------------------------------------
Adaptec AIC7xxx driver version: 5.0.20/3.2.4
Compile Options:
AIC7XXX_RESET_DELAY : 5
AIC7XXX_TAGGED_QUEUEING: Adapter Support Enabled
Check below to see which
devices use tagged queueing
AIC7XXX_PAGE_ENABLE : Enabled (This is no longer an option)
AIC7XXX_PROC_STATS : Enabled
Adapter Configuration:
SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
Ultra Wide Controller
Programmed I/O Base: 9000
Adaptec SCSI BIOS: Enabled
IRQ: 40
SCBs: Active 0, Max Active 4,
Allocated 29, HW 16, Page 255
Interrupts: 152343
BIOS Control Word: 0x18b6
Adapter Control Word: 0x005e
Extended Translation: Enabled
Disconnect Enable Flags: 0xffff
Ultra Enable Flags: 0xffff
Tag Queue Enable Flags: 0x0000
Ordered Queue Tag Flags: 0x0000
Default Tag Queue Depth: 8
Tagged Queue By Device array for aic7xxx host instance 0:
{255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
Actual queue depth per device for aic7xxx host instance 0:
{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}
Statistics:
(scsi0:0:0:0)
nxfers 22637 (18160 read;4477 written)
blks(512) rd=380703; blks(512) wr=119692
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 1 13175 154 609 1018 682 2027 494 0
Writes: 0 0 3122 472 111 286 43 30 413 0
(scsi0:0:1:0)
nxfers 41002 (40464 read;538 written)
blks(512) rd=561455; blks(512) wr=17466
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 1 35701 8 25 541 50 4136 2 0
Writes: 0 0 21 5 0 1 511 0 0 0
(scsi0:0:2:0)
nxfers 46545 (40105 read;6440 written)
blks(512) rd=816947; blks(512) wr=167312
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 1 27539 47 360 254 656 11247 1 0
Writes: 0 0 4295 0 0 0 0 2145 0 0
(scsi0:0:4:0)
nxfers 42031 (38469 read;3562 written)
blks(512) rd=843071; blks(512) wr=137282
< 512 512-1K 1-2K 2-4K 4-8K 8-16K 16-32K 32-64K 64-128K >128K
Reads: 0 1 23959 63 304 2195 706 11240 1 0
Writes: 0 0 1303 14 30 62 746 1407 0 0
-----------------------------------------------------------
Typical errors I see are like these:
-----------------------------------------------------------
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
-----------------------------------------------------------
This wedges the system completely.
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu