[101] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

BusLogic SCSI Driver

daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Sat Mar 18 15:37:37 1995

Date: Sat, 18 Mar 1995 10:55:20 -0800
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: gentzel@nova.enet.dec.com, linux-scsi@vger.rutgers.edu,
        Linus.Torvalds@cs.Helsinki.FI

I am doing some testing using a BT-946C PCI controller in place of the BT-445S
VL-BUS controller I had been using.  I am encountering the same problem others
have reported whereby there are a large number of messages of the form:

Mar 18 10:17:42 kelewan kernel: BusLogic SCSI: buslogic_interrupt: interrupt received, but no mail.

This problem only occurred very occasionally with the BT-445S, but occurs often
enough with the BT-946C that something needs to be done.  I am investigating
the BusLogic driver's interrupt handler and have asked a couple of questions of
BusLogic, as well as reading over the latest version of their manuals.  From my
readings and discussion with BusLogic, it appears the warning the driver prints
is unwarranted except perhaps as a debugging tool for the driver writers.
Because of the way the cards operate, it is entirely reasonable for one
invocation of the interrupt handler to handle more than one command completion,
and for the timing to be such that a subsequent interrupt is generated with
nothing to do.

To verify that this was happening, I implemented a simple event logging
facility that stores interesting events in the driver into the ramdisk for
later extraction and analysis.  All of these "no mail" conditions were of the
form:

 1 10  1 13	Entry
 1 10  1 13	Flags	81	(incoming mailbox loaded)
 1 10  1 13	INTR	81	(reset interrupt)
 1 10  1 13	MBI	31	(mailbox 31 was found to be ready)
 1 10  1 13	Flags	00
 1 10  1 13	INTR	00
 1 10  1 13	MBI	16
 1 10  1 13	Flags	00
 1 10  1 13	INTR	00
 1 10  1 13	Exit	(2 serviced)

 1 10  1 14	Entry
 1 10  1 14	Flags	81
 1 10  1 14	INTR	81
 2 10  1 14	Exit	(0 serviced)

The leftmost columns are the values of kstat.interrupts[i] for i = 0 (timer),
5 (serial ports), 10 (ethernet), and 11 (BusLogic).  This gives a crude
indication of timing and other interrupt activity that might be interfering.

Since the board must place the pointer into the mailbox before setting IMBL
(incoming mailbox loaded) and INTV (interrupot valid), this just means that we
found and processed it before the card set these bits.  I will shortly test a
modification to the interrupt handler to ignore this condition.

A much more serious problem is that interrupts are sometimes lost:

Mar 18 10:16:12 kelewan kernel: scsi : aborting command due to timeout : pid 402607, scsi0, id 1, lun 0 Write (10) 00 00 28 0f ac 00 00 02 00 
Mar 18 10:16:12 kelewan kernel: BusLogic SCSI: buslogic_abort: 10 81
Mar 18 10:16:12 kelewan kernel: BusLogic SCSI: buslogic_abort: lost interrupt discovered on irq 11 - attempting to recover...

I see a fair number of these on the 946C (a couple per hour under load) and am
at a loss to understand how we can be missing interrupts.  The event trace for
this type of occurrence is:

12  9  6  1	Entry
12  9  6  1	Flags	81
12  9  6  1	INTR	81
12  9  6  1	MBI	23
12  9  6  1	Flags	00
12  9  6  1	INTR	00
12  9  6  1	MBI	24
12  9  6  1	Flags	00
12  9  6  1	INTR	00
12  9  6  1	Exit	(2 serviced)

 3  9  6  1	LOST INTERRUPT (mbi 25)
 3  9  6  1	Entry			(simulated interrupt to recover)
 3  9  6  1	Flags	81
 3  9  6  1	INTR	81
 3  9  6  1	MBI	25
 3  9  6  1	Flags	00
 3  9  6  1	INTR	00
 3 10  6  1	MBI	26
 3 10  6  1	Flags	00
 3 10  6  1	INTR	00
 3 10  6  1	Exit	(2 serviced)

and

 8  6  6 15	Entry
 8  6  6 15	Flags	81
 8  6  6 15	INTR	81
 8  6  6 15	MBI	28
 8  6  6 15	Flags	00
 8  6  6 15	INTR	00
 8  6  6 15	MBI	29
 8  6  6 15	Flags	00
 8  6  6 15	INTR	00
 8  6  6 15	Exit	(2 serviced)

 2  6  6 15	LOST INTERRUPT (mbi 30)
 2  6  6 15	Entry
 2  6  6 15	Flags	81
 2  6  6 15	INTR	81
 2  6  6 15	MBI	30
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	31
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	16
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	17
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	18
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	19
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	20
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	MBI	21
 2  6  6 15	Flags	00
 2  6  6 15	INTR	00
 2  6  6 15	Exit	(8 serviced)

Somehow the card is posting an interrupt that we are not seeing, but subsequent
SCSI timeout recovery *always* recovers successfully.

I'd like to either (1) correct this so that we are not losing interrupts, (2)
poll the card manually at reasonable intervals so we don't wait too long to
discover the lost interrupt, or at least (3) detect this in a way that the
bogus abort messages don't appear.

I'd appreciate any suggestions for how to avoid lost interrupts.  From what
I've seen, I believe the low level interrupt handling code leaves interrupt 11
blocked while another interrupt 11 is being processed.  But if this is the
case, is it guaranteed that a pending interrupt will be seen when irq 11 is
enabled again during exit back through the low level code?  I've not seen a
description of precisely what happens on hardware interrupts, so any input on
this would be appreciated.

I plan some minor changed to the BusLogic interrupt handler to investigate this
situation further.

		Leonard

home help back first fref pref prev next nref lref last post