[1232] in linux-scsi channel archive
Re: 2.0.27: aha1740[9]_mbxout wait!
daemon@ATHENA.MIT.EDU (Michael Weller)
Tue Jan 7 08:06:18 1997
Date: Tue, 7 Jan 1997 12:13:55 +0100 (MEZ)
From: Michael Weller <eowmob@exp-math.uni-essen.de>
To: Jon Lewis <jlewis@inorganic5.fdt.net>
Cc: Barrie Spence <barrie@calvin.demon.co.uk>, linux-scsi@vger.rutgers.edu
In-Reply-To: <Pine.LNX.3.95.970105192856.174L-100000@inorganic5.fdt.net>
On Sun, 5 Jan 1997, Jon Lewis wrote:
> On Sun, 5 Jan 1997, Barrie Spence wrote:
>
> > Since I upgraded my system from a 486DX2-66 to a P5/120 (x2) with an
> > ASUS P55T2P4D motherboard, I've been getting these messages. As far as I
> > can tell, they only occur when the system is idle - never under any
> > significant load on the disk/controller.
> >
> > I believe that this may simply be a timing problem intrduced with the
> > pentium - as part if the upgrade I ran the board with a single processor
> > and with the caches disabled - during that time, I don't believe I ever
> > saw these messages.
>
> I ran a P90 news server with a 1740 for some months. It used to get these
> messages as well...and was rarely idle. I ended up wanting multiple SCSI
> busses, and the 1740 driver seemed to lack support for more than one
> card...so I went with NCR 810's.
A historical word:
The 1740 signals mbxout when it is currently not able to accept an address
for a new SCSI command descriptor. The address has four bytes which have
to be transfered one by one; when the address was only partially
transfered the 1740 signals also mbxout (until all 4 bytes are set). In
the first version of the 1740 driver this could lead to a race condition
where two processess accessed the 1740 at the same time leading to a
mbxout deadlock when one wrote only part of the address.
This problem was solved long ago; due to Adaptec the CPU will never see
mbxout_wait otherwise because the 1740 is so incredible fast and will
handle a new SCSI command descriptor so fast that the CPU can't ever catch
it with mbxout_wait set.
Today:
When I last looked in the 1740 driver, in the (impossible) case it catches
it with mbxout_wait it just spits out this warning message and does not do
much about it. However, there is a problem with printk reenabling
interrupts and opening the possibility for a race condition in this critical
area of the 1740 driver.
So my question: Apart from this (warning) message, does your system
continue to run, or does it lock up instantly (what could be a side
effect of the printk)?
I would suspect that your setup is just too fast and sees the impossible
(mbxout still busy). I'd assume a tiny patch to make it loop a few (more)
tiny times until mbxout is no longer busy is all you need.
What's your kernel version? I could try getting the same (I'm a bit out
of sync with current ALPHA versions) and providing another tiny patch to the
174x driver.
Comments from anyone else (Brad, Andreas?),
Michael.
(eowmob@exp-math.uni-essen.de or eowmob@pollux.exp-math.uni-essen.de
Please do not use my vm or de0hrz1a accounts anymore. In case of real
problems reaching me try mat42b@spi.power.uni-essen.de instead.)