[4524] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

AIC7XXX timeout problems on Alpha...

daemon@ATHENA.MIT.EDU (Matti Aarnio)
Tue Aug 11 10:53:18 1998

To: dledford@dialnet.net
Date: 	Tue, 11 Aug 1998 16:42:03 +0300 (EEST)
Cc: linux-scsi@vger.rutgers.edu
From: Matti Aarnio <matti.aarnio@sonera.fi>

Howdy,

  I have been ambivalent whether or not the problem has been in AIC7XXX
driver, or in RAID(0) subsystem.  (Or Linux SCSI system in overall..)

Now I am convinced that it is in the AIC7XXX.
I see these same errors in the same system also without RAID being
in use at all.,.  In fact I saw them even before I added my two new
9GB Quantum disks into a RAID0 stripes, although rarely.

With a high disk activity I have increased propability of encountering
the error condition.  If I have a '(while sync;do sleep 1;done)&' loop
going on in background, then it is a bit less likely to occur than
without it, while I am hammering the system with some massive software
compilation process. (e.g. compiling X on my Alpha..)

Right now the version of AIC7XXX driver in use is the one in kernel
2.1.115.

I recall having tried also   aic7xxx-5.1.0-pre5-2.1.109.patch.gz
version, and still encountering the same problem.

My suspicion is that amongst the 10 000 lines of source the AIC7XXX
driver does something dangerous with pointer chains without having
proper spinlock/whatever in place outside the interrupt context.

It could be more general SCSI-layer problem as well, although that
is another magnitude of more complexity of which nobody seem to be
well versed with...

Below are a lot more details, and some comments in between cut&paste
passages..

/Matti Aarnio <matti.aarnio@sonera.fi>

My disks are:

 -----------------------------------------------------------
# cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: DEC      Model: RZ1BB-BS (C) DEC Rev: 0658
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: DEC      Model: RZ1CC-BA (C) DEC Rev: 880F
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: QUANTUM  Model: VIKING II 9.1WLS Rev: 4110
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: QUANTUM  Model: VIKING II 9.1WLS Rev: 4110
  Type:   Direct-Access                    ANSI SCSI revision: 02
[root@mea /root]#
 -----------------------------------------------------------

( from "cat /proc/scsi/aic7xxx/0" command output )
 -----------------------------------------------------------
Adaptec AIC7xxx driver version: 5.0.20/3.2.4
Compile Options:
  AIC7XXX_RESET_DELAY    : 5
  AIC7XXX_TAGGED_QUEUEING: Adapter Support Enabled
                             Check below to see which
                             devices use tagged queueing
  AIC7XXX_PAGE_ENABLE    : Enabled (This is no longer an option)
  AIC7XXX_PROC_STATS     : Enabled

Adapter Configuration:
           SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
                           Ultra Wide Controller
    Programmed I/O Base: 9000
      Adaptec SCSI BIOS: Enabled
                    IRQ: 40
                   SCBs: Active 0, Max Active 4,
                         Allocated 29, HW 16, Page 255
             Interrupts: 152343
      BIOS Control Word: 0x18b6
   Adapter Control Word: 0x005e
   Extended Translation: Enabled
Disconnect Enable Flags: 0xffff
     Ultra Enable Flags: 0xffff
 Tag Queue Enable Flags: 0x0000
Ordered Queue Tag Flags: 0x0000
Default Tag Queue Depth: 8
    Tagged Queue By Device array for aic7xxx host instance 0:
      {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255}
    Actual queue depth per device for aic7xxx host instance 0:
      {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1}

Statistics:
(scsi0:0:0:0)
nxfers 22637 (18160 read;4477 written)
blks(512) rd=380703; blks(512) wr=119692
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      1  13175    154    609   1018    682   2027    494      0 
Writes:     0      0   3122    472    111    286     43     30    413      0 

(scsi0:0:1:0)
nxfers 41002 (40464 read;538 written)
blks(512) rd=561455; blks(512) wr=17466
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      1  35701      8     25    541     50   4136      2      0 
Writes:     0      0     21      5      0      1    511      0      0      0 

(scsi0:0:2:0)
nxfers 46545 (40105 read;6440 written)
blks(512) rd=816947; blks(512) wr=167312
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      1  27539     47    360    254    656  11247      1      0 
Writes:     0      0   4295      0      0      0      0   2145      0      0 

(scsi0:0:4:0)
nxfers 42031 (38469 read;3562 written)
blks(512) rd=843071; blks(512) wr=137282
        < 512 512-1K   1-2K   2-4K   4-8K  8-16K 16-32K 32-64K 64-128K >128K
 Reads:     0      1  23959     63    304   2195    706  11240      1      0 
Writes:     0      0   1303     14     30     62    746   1407      0      0
 -----------------------------------------------------------


Typical errors I see are like these:

 -----------------------------------------------------------
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
scsi : aborting command due to timeout : pid 141518, scsi0, channel 0, id 0, lun 0 Read (6) 0f 38 96 02 00
-----------------------------------------------------------

This wedges the system completely.

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post