[1233] in linux-scsi channel archive
Re: 2.0.27: aha1740[9]_mbxout wait!
daemon@ATHENA.MIT.EDU (Barrie Spence)
Wed Jan 8 01:27:43 1997
Date: Tue, 07 Jan 1997 22:45:14 +0000
From: Barrie Spence <barrie@calvin.demon.co.uk>
To: Michael Weller <eowmob@exp-math.uni-essen.de>
CC: linux-scsi@vger.rutgers.edu
Michael Weller wrote:
>
> On Sun, 5 Jan 1997, Jon Lewis wrote:
>
> > On Sun, 5 Jan 1997, Barrie Spence wrote:
> >
> > > Since I upgraded my system from a 486DX2-66 to a P5/120 (x2) with an
> > > ASUS P55T2P4D motherboard, I've been getting these messages. As far as I
> > > can tell, they only occur when the system is idle - never under any
> > > significant load on the disk/controller.
> > >
> > > I believe that this may simply be a timing problem intrduced with the
> > > pentium - as part if the upgrade I ran the board with a single processor
> > > and with the caches disabled - during that time, I don't believe I ever
> > > saw these messages.
> >
> > I ran a P90 news server with a 1740 for some months. It used to get these
> > messages as well...and was rarely idle. I ended up wanting multiple SCSI
> > busses, and the 1740 driver seemed to lack support for more than one
> > card...so I went with NCR 810's.
>
> A historical word:
>
> The 1740 signals mbxout when it is currently not able to accept an address
> for a new SCSI command descriptor. The address has four bytes which have
> to be transfered one by one; when the address was only partially
> transfered the 1740 signals also mbxout (until all 4 bytes are set). In
> the first version of the 1740 driver this could lead to a race condition
> where two processess accessed the 1740 at the same time leading to a
> mbxout deadlock when one wrote only part of the address.
>
> This problem was solved long ago; due to Adaptec the CPU will never see
> mbxout_wait otherwise because the 1740 is so incredible fast and will
> handle a new SCSI command descriptor so fast that the CPU can't ever catch
> it with mbxout_wait set.
>
> Today:
>
> When I last looked in the 1740 driver, in the (impossible) case it catches
> it with mbxout_wait it just spits out this warning message and does not do
> much about it. However, there is a problem with printk reenabling
> interrupts and opening the possibility for a race condition in this critical
> area of the 1740 driver.
>
> So my question: Apart from this (warning) message, does your system
> continue to run, or does it lock up instantly (what could be a side
> effect of the printk)?
It appears to run fine - uptime was almost 20 days when I rebooted for a
new
kernel config - I've pushed it hard with lots of disk activity during
that
time, but as I said in my first post the messages are usually logged
when the
system is unattended (idle) - from today, while I was at work:
Jan 7 09:08:22 calvin kernel: aha1740[24]_mbxout wait!
Jan 7 11:48:22 calvin kernel: aha1740[11]_mbxout wait!
Jan 7 13:08:21 calvin kernel: aha1740[16]_mbxout wait!
Jan 7 13:18:23 calvin kernel: aha1740[12]_mbxout wait!
Jan 7 13:28:24 calvin kernel: aha1740[10]_mbxout wait!
Jan 7 14:38:22 calvin kernel: aha1740[30]_mbxout wait!
Jan 7 17:08:21 calvin kernel: aha1740[30]_mbxout wait!
Jan 7 17:28:23 calvin kernel: aha1740[22]_mbxout wait!
Jan 7 17:38:24 calvin kernel: aha1740[24]_mbxout wait!
Jan 7 18:28:21 calvin kernel: aha1740[2]_mbxout wait!
Jan 7 18:58:24 calvin kernel: aha1740[4]_mbxout wait!
> I would suspect that your setup is just too fast and sees the impossible
> (mbxout still busy). I'd assume a tiny patch to make it loop a few (more)
> tiny times until mbxout is no longer busy is all you need.
Yes, I'm running with the processor and L2 cache disabled just now - no
messages.
> What's your kernel version? I could try getting the same (I'm a bit out
> of sync with current ALPHA versions) and providing another tiny patch to the
> 174x driver.
Stock 2.0.27 (RedHat devel rpm). A patch would be greatly appreciated.
FYI, from dmesg:
Configuring Adaptec at IO:1c80, IRQ 11
EATA0: address 0x1c88 in use, skipping probe.
EATA0: rev. 2.0B, EISA, PORT 0x3c88, IRQ 15, DMA 255, SG 64, Mbox 64,
CmdLun 2.
EATA0: SCSI channel 0 enabled, host target ID 7.
EATA1: address 0x330 in use, skipping probe.
EATA/DMA 2.0x: Copyright (C) 1994, 1995, 1996 Dario Ballabio.
scsi0 : Adaptec 174x (EISA)
scsi1 : EATA/DMA 2.0x rev. 2.30.00
scsi : 2 hosts.
Vendor: DEC Model: DSP3160S Rev: T427
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: CONNER Model: CFP2107S 2.14GB Rev: 2B4B
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 2, lun 0
Vendor: HP Model: HP35470A Rev: 1009
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi tape st0 at scsi0, channel 0, id 4, lun 0
Vendor: TOSHIBA Model: CD-ROM XM-3401TA Rev: 0283
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 6, lun 0
Vendor: DEC Model: DSP3210S Rev: X441
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0
Vendor: HP Model: HP35470A Rev: 7 09
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi tape st1 at scsi1, channel 0, id 3, lun 0
Vendor: IMS Model: CDD521/10 Rev: 2.04
Type: WORM ANSI SCSI revision: 01
Detected scsi CD-ROM sr1 at scsi1, channel 0, id 4, lun 0
Vendor: SONY Model: CD-ROM CDU-8012 Rev: 3.1a
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr2 at scsi1, channel 0, id 5, lun 0
scsi : detected 2 SCSI tapes 3 SCSI cdroms 3 SCSI disks total.
Thanks,
Barrie
--
Barrie Spence Sanity Clause? There is no Sanity Clause
Home: barrie@calvin.demon.co.uk Telephone +44 1506 442304
Play: barrie@sqf.hp.com Telephone +44 131 331 7103