[2424] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: ncr53c8xx error messages (long).

daemon@ATHENA.MIT.EDU (Gerard Roudier)
Fri Sep 5 17:56:19 1997

Date: 	Fri, 5 Sep 1997 22:48:41 +0200 (MET DST)
From: Gerard Roudier <groudier@club-internet.fr>
To: George Farris <george@ve7frg.gmsys.com>
cc: vger.rutgers.edu!linux-scsi@raven.cc.mala.bc.ca,
        linux-scsi@vger.rutgers.edu
In-Reply-To: <m0x70wo-0009fQC@gmsys.com>



On Fri, 5 Sep 1997, George Farris wrote:

> > ncr53c875-0-<2,0>: phase change 2-3 10@0000eca0 resid=9.

This message means that the device changed from COMMAND PHASE to STATUS 
PHASE without accepting remaining 9 bytes of a 10 bytes scsi command.

Severall causes are possible:
1 - A spurious COMMAND PHASE has been detected by the chip due to a bad 
  signal level.
2 - The target is brain-deaded.
3 - Some others.

- DATA OUT PHASE is 0
- COMMAND PHASE  is 2  bit C/D (COMMAND/DATA)
- STATUS PHASE   is 3  bit I/O (INPUT/OUTPUT)

Example of possible cause (1) due to hardware failure:

In order to go from DATA OUT phase to STATUS phase, BOTH signal C/D and
I/O have to be asserted (set to logical 1 = 0 Volt).
A new phase is valid on the assertion of the REQ signal by the target.

Suppose the device wanted to go from DATA OUT phase to STATUS phase, but:

- The NCR saw a spurious REQ signal and I/O was not yet asserted, or 
- the NCR wrongly saw I/O bit to 0 when the REQ has been asserted,
then the NCR could have wrongly detected a valid COMMAND PHASE.

Probably lots of other possibles causes could explain the problem, but 
I would be *very* surprised the cause to be a driver problem.

> > scsi : aborting command due to timeout : pid 105389, scsi0, channel 0, id 2, lun 0 Write (10) 00 00 43 1c c2 00 00 02 00 
> > ncr53c8xx_abort: pid=105389 serial_number=105409 serial_number_at_timeout=105409
> > scsi : aborting command due to timeout : pid 105390, scsi0, channel 0, id 0, lun 0 Write (6) 0a b2 a8 02 00 
> > ncr53c8xx_abort: pid=105390 serial_number=105410 serial_number_at_timeout=105410
> > ncr53c875-0: abort ccb=c02fc020 (skip)
> > SCSI host 0 abort (pid 105390) timed out - resetting
> > SCSI bus is being reset for host 0 channel 0.


> Much deleted.  Is it just me or is this a problem with the 
> ncr53c8xx drivers.  I get this the odd time on linux-2.0.28 and 
> I've seen numerous people complain about the same thing.  The odd 
> time the system can't be revived and one must hit the ol red switch.

The only message that is specific to the ncr53c8xx driver is the 
'phase change 2-3'. This problem can be explained by numerous hardware
problems or some target wrong behaviour. At least for the moment, I
donnot have any idea about how a driver bug could cause such an error.
All others messages are due to command timeouts/resets asked by the
middle level scsi driver and one can get identical messages using ANY 
scsi driver and for NUMEROUS causes. These timeouts are very probably 
due to the weird phase error that confused the ncr53c8xx driver.
 
> The funny thing is I really don't see this complaint unless it's 
> someone using the ncr53c8xx drivers.

I assume that the complaint you mentionned is the 'phase change 2-3', 
since all other messages are common scsi error messages.
Could you send me pointers to the corresponding postings?
Any kind of error report involving the ncr53c8xx driver is helpful for me.


Gerard.



home help back first fref pref prev next nref lref last post