[2398] in linux-scsi channel archive
Re: Problem with ASUS PCI-SC875
daemon@ATHENA.MIT.EDU (Gerard Roudier)
Sun Aug 31 18:30:47 1997
Date: Sun, 31 Aug 1997 23:56:32 +0200 (MET DST)
From: Gerard Roudier <groudier@club-internet.fr>
To: Edward Welbon <welbon@bga.com>
cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <Pine.LNX.3.96.970830232229.335A-100000@max1-77.ip.realtime.net>
On Sat, 30 Aug 1997, Edward Welbon wrote:
> Gerard, when running Bonnie with a 2047MB file, my system locked up, and
> had to be rebooted (though not power cycled); the only error messages I
> could recover were these:
>
> >From /var/log/syslog
>
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: unknown interrupt(s) ignored sist=40 dstat=0
Normally, this should never happen.
This message means that the SIP bit was set in the ISTAT register and that
the driver read value 0x40=(CMP:Arbitration complete) from the SIST
register. This interrupt condition is 'non-fatal' and masked by the
driver. According to Symbios docs, the SIP bit should not have been set
by the chip in such a situation.
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0:0: ERROR (81:1) (e-aa-0) (f/9d) @ script (164:72580000).
DSTAT=81 means DMA fifo empty + SCRIPTS illegal instruction.
SIST = 1 means SCSI parity error.
SOCL = 0xe means 8=ATN 6=MSG_OUT (SCSI output control latch)
SBCL = 0xaa means 0x80=REQ 0x20=BSY 8=ATN 2=MSG_IN (SCSI BUS control line)
SBDL = 0 (SCSI bus data lines)
A SCSI parity error seems to have occured in MSG_IN phase. I donnot have
explanation about the 'illegal instruction' condition. I will try to
find some.
For the moment, I have the following ideas about the possible causes:
1 - The interrupt routine could have been wrongly reentered for the
same controller.
The fact that your system got scsi parity errors increases the
probability the chip will stack 2 SCSI errors (for example Phase
mismatch + SCSI parity error). If interrupts are enabled, a reentrancy
problem might occur.
The driver sets the SA_INTERRUPT flag for requesting its IRQs. Fast
IRQs are dispatched with interrupts disabled.
The driver does not enable interrupts in its interrupt service
routine. So, interrupt reentrancy should not occur.
2 - The 875 does not behave as expected and may raise the SIP bit in ISTAT
on a masked non-fatal interrupt condition.
3 - I have misunderstanding something in the IRQ handling of Linux and/or
in Symbios docs.
4 - The problem could be due to a bug in the driver I did'nt find yet.
For now, I am leaning towards explanation 1. But all are possible and
probably some others.
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: script cmd = 800c0001
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: regdump: da 10 80 9d 47 0f 00 0e 00 0e 80 aa 80 00 02 00.
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: have to clear fifos.
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: resetting, command processing suspended for 2 seconds
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: enabling clock multiplier
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: copying script fragments into the on-board RAM ...
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0: command processing resumed
>
> >From /var/log/messages
>
> Aug 30 22:21:00 tarantula kernel: ncr53c875-0: restart (scsi reset).
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<1,0>: WIDE SCSI (16 bit) enabled.
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<2,0>: WIDE SCSI (16 bit) enabled.
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<0,0>: WIDE SCSI (16 bit) enabled.
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<1,0>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<2,0>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)
> Aug 30 22:21:02 tarantula kernel: ncr53c875-0-<0,0>: FAST-20 WIDE SCSI 40.0 MB/s (50 ns, offset 15)
>
> I have three asus 875 scsi cards, each has three disks, the total of nine
> disks are configured as an md raid0. Using initrd, I mount / on these.
> Is there a problem with the order in which the scsi controllers are
> restarted?
Any order should'nt cause problems. Pending SCSI commands are normally
requeued to the driver.
Could you send me the description of your hardware and software
configuration, especially number of CPUs, IRQs assigned to 875 chips,
Linux version and patches used?
Could you also send me (directly) all syslog messages you got about the
problem?
Thanks a lot for your report. I'm impressed by your SCSI configuration.
Below is a patch that contains a work around for 'explanantion 2'.
(It enables also interrupt for SCSI parity errors).
You can try it if you think it can be usefull.
Gerard.
--- linux/drivers/scsi/ncr53c8xx.c.2.5b.1 Sun Aug 31 09:46:45 1997
+++ linux/drivers/scsi/ncr53c8xx.c Sun Aug 31 21:30:25 1997
@@ -5973,7 +5973,7 @@
** enable ints
*/
- OUTW (nc_sien , STO|HTH|MA|SGE|UDC|RST);
+ OUTW (nc_sien , STO|HTH|MA|SGE|UDC|RST|PAR);
OUTB (nc_dien , MDPE|BF|ABRT|SSI|SIR|IID);
/*
@@ -6929,9 +6929,12 @@
ncr_int_sir (np);
return;
}
- if (!(sist & (SBMC|PAR)) && !(dstat & SSI))
- printf("%s: unknown interrupt(s) ignored sist=%x dstat=%x\n",
- ncr_name(np), sist, dstat);
+ if (!(sist & (SBMC|PAR)) && !(dstat & SSI)) {
+ printf( "%s: unknown interrupt(s) ignored, "
+ "ISTAT=%x DSTAT=%x SIST=%x\n",
+ ncr_name(np), istat, dstat, sist);
+ return;
+ }
OUTONB (nc_dcntl, (STD|NOCOM));
return;
};
----------------------- Cut here ---------------------------------