[646] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: AHA-1740 SCSI errors

daemon@ATHENA.MIT.EDU (Michael Weller)
Sun Oct 1 22:41:31 1995

Date: Mon, 2 Oct 1995 01:28:23 +0100 (MEZ)
From: Michael Weller <eowmob@exp-math.uni-essen.de>
To: Jon Lewis <jlewis@inorganic5.chem.ufl.edu>
Cc: linux-scsi@vger.rutgers.edu
In-Reply-To: <9510020010.AA03742@pollux.exp-math.uni-essen.de>

> Date: Sun, 1 Oct 1995 16:19:40 -0400 (EDT)
> From: Jon Lewis <jlewis@inorganic5.chem.ufl.edu>
> To: linux-scsi@vger.rutgers.edu
> Subject: AHA-1740 SCSI errors 
> 
> Does anyone know what might be causing the following errors in 2 P90 systems?
> System 1 is a news server (full feed), system 2 is a mail server and 
> shell access system.  Both have Adaptec 1740's and Micropolis drives:
> MICROP    Model: 3243-19MZ  Q4D    Rev: HT02

I run a 1742 ith an IBM 1GB disk, a Fujitsu 1GB disk, an Archive tape, a 
Toshiba cdrom and 1.2.6 and nada problems.

There might be some problems reintroduced in the later 1.2.x releases but 
I doubt it.

However this is my private system which is probably less stressed than yours.

> The first also has a Conner 30540, the second a SCSI DAT 
drive.
> Both seem to get large numbers of the same sort of errors, but the news 
> server has a tendancy to lock up roughly weekly, while the other can go 
> weeks at a time with no serious problems.

Probably this is just because the news server stresses its disk more. 
Also the fs code has a tendency to look the machine as result of fatal 
disk errors.

> Since the errors come from all the SCSI devices, I'm guessing it's either 
> a cabling or driver problem.  I doubt both controllers are bad, but we 
> are using long cheap SCSI cables.  Anyone have any ideas?  I'm 
            ===============

Why are you posing questions and then answer them yourself? Might it even 
be that these are cables to external devices? A friend of mine sells servers
and they have a bunch (non-linux) problems with these kinds of stuff. SCSI-2
is very sensible to bad cables. Standard external cables are usually to bad.
Get (expensive) certified SCSI-2 cables. Ensure proper termination and proper
termination power. Consider active terminators whenever possible. I've heard
also that even old-style 50p big centronics plugs at external cases may 
cause problems with SCSI-2, try to get cases with small scsi-2 plugs.

For internal cabling it is less critical (the ribbon cables are fairly 
good as well as the pressed on connectors.) Still avoid open, unused 
connectors and use cables as small as possible.

> considering trying an NCR as a replacement since they're cheap and fast.

I think they're SCSI-2 and thus as sensible.

> System 1: (running 1.2.11)
> 
> scsi : aborting command due to timeout : pid 1293917, scsi0, id 0, lun 0 
> 	Write (6) 0b 1f 02 30 00 
[...]
The timeout look like lost commands either coz teh disk never got it or 
didn't accept it coz of wrong parity (but the adaptec should signal that).

Yes, I'd say it either looks like cabling/term problems or a very broken
scsi-drive (but two toasty drives? However the lousy cabling seems to be 
the common denominator of both toasty systems.)
However, someone else might have other experiences to share?

Michael.

(eowmob@exp-math.uni-essen.de or  eowmob@pollux.exp-math.uni-essen.de
Please do not use my vm or de0hrz1a accounts anymore. In case of real
problems reaching me try mat42b@aixrs1.hrz.uni-essen.de instead.)


home help back first fref pref prev next nref lref last post