[1231] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: 2.0.27: aha1740[9]_mbxout wait!

daemon@ATHENA.MIT.EDU (Michael Weller)
Tue Jan 7 08:06:16 1997

Date: 	Tue, 7 Jan 1997 12:13:55 +0100 (MEZ)
From: Michael Weller <eowmob@exp-math.uni-essen.de>
To: Jon Lewis <jlewis@inorganic5.fdt.net>
Cc: Barrie Spence <barrie@calvin.demon.co.uk>, linux-scsi@vger.rutgers.edu
In-Reply-To: <Pine.LNX.3.95.970105192856.174L-100000@inorganic5.fdt.net>

On Sun, 5 Jan 1997, Jon Lewis wrote:

> On Sun, 5 Jan 1997, Barrie Spence wrote:
> 
> > Since I upgraded my system from a 486DX2-66 to a P5/120 (x2) with an
> > ASUS P55T2P4D motherboard, I've been getting these messages. As far as I
> > can tell, they only occur when the system is idle - never under any
> > significant load on the disk/controller.
> > 
> > I believe that this may simply be a timing problem intrduced with the
> > pentium - as part if the upgrade I ran the board with a single processor
> > and with the caches disabled - during that time, I don't believe I ever
> > saw these messages.
> 
> I ran a P90 news server with a 1740 for some months.  It used to get these
> messages as well...and was rarely idle.  I ended up wanting multiple SCSI
> busses, and the 1740 driver seemed to lack support for more than one
> card...so I went with NCR 810's.

A historical word:

The 1740 signals mbxout when it is currently not able to accept an address
for a new SCSI command descriptor. The address has four bytes which have
to be transfered one by one; when the address was only partially
transfered the 1740 signals also mbxout (until all 4 bytes are set). In
the first version of the 1740 driver this could lead to a race condition
where two processess accessed the 1740 at the same time leading to a
mbxout deadlock when one wrote only part of the address.

This problem was solved long ago; due to Adaptec the CPU will never see
mbxout_wait otherwise because the 1740 is so incredible fast and will
handle a new SCSI command descriptor so fast that the CPU can't ever catch
it with mbxout_wait set. 

Today:

When I last looked in the 1740 driver, in the (impossible) case it catches
it with mbxout_wait it just spits out this warning message and does not do
much about it. However, there is a problem with printk reenabling
interrupts and opening the possibility for a race condition in this critical
area of the 1740 driver. 

So my question: Apart from this (warning) message, does your system 
continue to run, or does it lock up instantly (what could be a side 
effect of the printk)?

I would suspect that your setup is just too fast and sees the impossible
(mbxout still busy). I'd assume a tiny patch to make it loop a few (more) 
tiny times until mbxout is no longer busy is all you need.

What's your kernel version? I could try getting the same (I'm a bit out 
of sync with current ALPHA versions) and providing another tiny patch to the 
174x driver.

Comments from anyone else (Brad, Andreas?),
Michael.

(eowmob@exp-math.uni-essen.de or  eowmob@pollux.exp-math.uni-essen.de
Please do not use my vm or de0hrz1a accounts anymore. In case of real
problems reaching me try mat42b@spi.power.uni-essen.de instead.)

home help back first fref pref prev next nref lref last post