[8400] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: 2.2.13 domain validation causes errors ?

daemon@ATHENA.MIT.EDU (Kurt Garloff)
Sat Mar 18 10:56:10 2000

Date:	Sat, 18 Mar 2000 09:11:20 +0100
From: Kurt Garloff <garloff@suse.de>
To: "Michael Stumpf" <michael@stumpf.de>
Cc: <linux-scsi@vger.rutgers.edu>
Message-ID: <20000318091120.B17009@mobil.tue.nl>
Mail-Followup-To: "Michael Stumpf" <michael@stumpf.de>,
	<linux-scsi@vger.rutgers.edu>
Mime-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-md5;
	protocol="application/pgp-signature"; boundary="neYutvxvOLaeuPCA"
In-Reply-To: <000101bf9069$d47d23c0$6500a8c0@lokal>; from michael@stumpf.de on Sat, Mar 18, 2000 at 12:38:00AM +0100


--neYutvxvOLaeuPCA
Content-Type: text/plain; charset=us-ascii

On Sat, Mar 18, 2000 at 12:38:00AM +0100, Michael Stumpf wrote:
> i have a problem with a scsi hard disk IBM DCAS-34330 connected to an
> AHA-2940.
> I think the error is caused by the domain validation.
> Then it did this "Domain validation" and immediately afterwards the errors
> occured.
> No login was possible anymore.
> 
> 1. What is this domain validation ?

The controller tests, whether the negotiated speed can be safely used and
otherwise reduces its speed. AFAIK, this is done by writing to and reading
from the device's buffer.

> 2. Is there a bugfix available ?

First a bug has to be spotted. Maybe Doug knows about one?

There's the possibility, that the device mixes up the data written to the
buffer by WRITE_BUFFER with the data that it hold reading or writing to the
disk. Could be either a firmware or a driver bug.

> 3. Is it a hardware problem ?
>    (the system was previously running under novell 3.2 w/o any problems)

Could be. The domain validation was most probably triggered by some sort of
bug, most probably a parity error.

> 4. Is there a way to force the Domain validation. -> reproducing the bug ?

Don't know.

> Mar 16 23:46:41 linux kernel: (scsi0:0:0:0) Performing Domain validation.
> Mar 16 23:46:41 linux kernel: (scsi0:0:0:0) Successfully completed Domain
> validation.
> Mar 16 23:46:57 linux kernel: attempt to access beyond end of device
> Mar 16 23:46:57 linux kernel: 08:03: rw=0, want=764136424, limit=1049600
> Mar 16 23:46:57 linux kernel: dev 08:03 blksize=4096 blocknr=1801646841
> sector=1528272840 size=4096 count=1
> ...

Looks like filesystem corruption to me. A block with no. 764136424 certainly
does not exist, but the filesystem most probably points to such a block,
that's why the driver tries to read it.

The question is what was causing it.

Did a e2fsck fix this problem? Or didn't it find an error in your fs?
That would point to memory corruption ... 

> /dev/sda3           139      1163   1049600   83  Linux
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>   Vendor: IBM      Model: DCAS-34330       Rev: S65A
> Adaptec AIC7xxx driver version: 5.1.20/3.2.4
> (scsi0:0:0:0)
>   Device using Narrow/Sync transfers at 10.0 MByte/sec, offset 15
>   Transinfo settings: current(25/15/0/0), goal(25/15/0/0), user(25/15/0/0)
>   Total transfers 2245609 (1555107 reads and 690502 writes)

Nothing unusual, AFAICT.

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE GmbH, Nuernberg, FRG                               SCSI, Security

--neYutvxvOLaeuPCA
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1c- (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE40zonxmLh6hyYd04RAd1NAJ9/U5RWogJYisBRjdHZgWeyNN8htwCfZclD
COeI+/e79DGiBWZDwlJo8vc=
=oGZT
-----END PGP SIGNATURE-----

--neYutvxvOLaeuPCA--

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post