[101] in linux-scsi channel archive
BusLogic SCSI Driver
daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Sat Mar 18 15:37:37 1995
Date: Sat, 18 Mar 1995 10:55:20 -0800
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: gentzel@nova.enet.dec.com, linux-scsi@vger.rutgers.edu,
Linus.Torvalds@cs.Helsinki.FI
I am doing some testing using a BT-946C PCI controller in place of the BT-445S
VL-BUS controller I had been using. I am encountering the same problem others
have reported whereby there are a large number of messages of the form:
Mar 18 10:17:42 kelewan kernel: BusLogic SCSI: buslogic_interrupt: interrupt received, but no mail.
This problem only occurred very occasionally with the BT-445S, but occurs often
enough with the BT-946C that something needs to be done. I am investigating
the BusLogic driver's interrupt handler and have asked a couple of questions of
BusLogic, as well as reading over the latest version of their manuals. From my
readings and discussion with BusLogic, it appears the warning the driver prints
is unwarranted except perhaps as a debugging tool for the driver writers.
Because of the way the cards operate, it is entirely reasonable for one
invocation of the interrupt handler to handle more than one command completion,
and for the timing to be such that a subsequent interrupt is generated with
nothing to do.
To verify that this was happening, I implemented a simple event logging
facility that stores interesting events in the driver into the ramdisk for
later extraction and analysis. All of these "no mail" conditions were of the
form:
1 10 1 13 Entry
1 10 1 13 Flags 81 (incoming mailbox loaded)
1 10 1 13 INTR 81 (reset interrupt)
1 10 1 13 MBI 31 (mailbox 31 was found to be ready)
1 10 1 13 Flags 00
1 10 1 13 INTR 00
1 10 1 13 MBI 16
1 10 1 13 Flags 00
1 10 1 13 INTR 00
1 10 1 13 Exit (2 serviced)
1 10 1 14 Entry
1 10 1 14 Flags 81
1 10 1 14 INTR 81
2 10 1 14 Exit (0 serviced)
The leftmost columns are the values of kstat.interrupts[i] for i = 0 (timer),
5 (serial ports), 10 (ethernet), and 11 (BusLogic). This gives a crude
indication of timing and other interrupt activity that might be interfering.
Since the board must place the pointer into the mailbox before setting IMBL
(incoming mailbox loaded) and INTV (interrupot valid), this just means that we
found and processed it before the card set these bits. I will shortly test a
modification to the interrupt handler to ignore this condition.
A much more serious problem is that interrupts are sometimes lost:
Mar 18 10:16:12 kelewan kernel: scsi : aborting command due to timeout : pid 402607, scsi0, id 1, lun 0 Write (10) 00 00 28 0f ac 00 00 02 00
Mar 18 10:16:12 kelewan kernel: BusLogic SCSI: buslogic_abort: 10 81
Mar 18 10:16:12 kelewan kernel: BusLogic SCSI: buslogic_abort: lost interrupt discovered on irq 11 - attempting to recover...
I see a fair number of these on the 946C (a couple per hour under load) and am
at a loss to understand how we can be missing interrupts. The event trace for
this type of occurrence is:
12 9 6 1 Entry
12 9 6 1 Flags 81
12 9 6 1 INTR 81
12 9 6 1 MBI 23
12 9 6 1 Flags 00
12 9 6 1 INTR 00
12 9 6 1 MBI 24
12 9 6 1 Flags 00
12 9 6 1 INTR 00
12 9 6 1 Exit (2 serviced)
3 9 6 1 LOST INTERRUPT (mbi 25)
3 9 6 1 Entry (simulated interrupt to recover)
3 9 6 1 Flags 81
3 9 6 1 INTR 81
3 9 6 1 MBI 25
3 9 6 1 Flags 00
3 9 6 1 INTR 00
3 10 6 1 MBI 26
3 10 6 1 Flags 00
3 10 6 1 INTR 00
3 10 6 1 Exit (2 serviced)
and
8 6 6 15 Entry
8 6 6 15 Flags 81
8 6 6 15 INTR 81
8 6 6 15 MBI 28
8 6 6 15 Flags 00
8 6 6 15 INTR 00
8 6 6 15 MBI 29
8 6 6 15 Flags 00
8 6 6 15 INTR 00
8 6 6 15 Exit (2 serviced)
2 6 6 15 LOST INTERRUPT (mbi 30)
2 6 6 15 Entry
2 6 6 15 Flags 81
2 6 6 15 INTR 81
2 6 6 15 MBI 30
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 31
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 16
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 17
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 18
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 19
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 20
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 MBI 21
2 6 6 15 Flags 00
2 6 6 15 INTR 00
2 6 6 15 Exit (8 serviced)
Somehow the card is posting an interrupt that we are not seeing, but subsequent
SCSI timeout recovery *always* recovers successfully.
I'd like to either (1) correct this so that we are not losing interrupts, (2)
poll the card manually at reasonable intervals so we don't wait too long to
discover the lost interrupt, or at least (3) detect this in a way that the
bogus abort messages don't appear.
I'd appreciate any suggestions for how to avoid lost interrupts. From what
I've seen, I believe the low level interrupt handling code leaves interrupt 11
blocked while another interrupt 11 is being processed. But if this is the
case, is it guaranteed that a pending interrupt will be seen when irq 11 is
enabled again during exit back through the low level code? I've not seen a
description of precisely what happens on hardware interrupts, so any input on
this would be appreciated.
I plan some minor changed to the BusLogic interrupt handler to investigate this
situation further.
Leonard