[7606] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

2.3.29 fails to recover from SCSI errors

daemon@ATHENA.MIT.EDU (Neil Brown)
Thu Dec 2 20:24:55 1999

From: Neil Brown <neilb@cse.unsw.edu.au>
To: linux-scsi@vger.rutgers.edu
Date:   Fri, 3 Dec 1999 11:58:43 +1100 (EST)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <14407.5571.997000.207767@notabene.cse.unsw.EDU.AU>
Cc: linux-kernel@vger.rutgers.edu


Hi,
 I am having problems with 2.3.29 (and 28 atleast) with SCSI errors
 causing the machine to lock up.

 After a bunch of errors like:

Dec  3 11:30:59 glass kernel: scsi : aborting command due to timeout : pid 188131, scsi0, channel 0, id 1, lun 0 Read (10) 00 00 37 3f 53 00 00 02 00  
Dec  3 11:30:59 glass kernel: scsi : aborting command due to timeout : pid 188136, scsi0, channel 0, id 1, lun 0 Read (6) 0e 86 d3 10 00  
Dec  3 11:30:59 glass kernel: scsi : aborting command due to timeout : pid 188134, scsi0, channel 0, id 1, lun 0 Write (6) 01 3f 51 02 00  
Dec  3 11:31:00 glass kernel: SCSI host 0 abort (pid 188131) timed out - resetting 
Dec  3 11:31:00 glass kernel: SCSI bus is being reset for host 0 channel 0. 
Dec  3 11:31:00 glass kernel: (scsi0:0:1:0) Performing Domain validation. 
Dec  3 11:31:00 glass kernel: SCSI host 0 abort (pid 188131) timed out - resetting 
Dec  3 11:31:00 glass kernel: SCSI bus is being reset for host 0 channel 0. 
Dec  3 11:31:00 glass kernel: scsi : aborting command due to timeout : pid 188236, scsi0, channel 0, id 3, lun 0 Read (6) 10 07 d7 02 00  
Dec  3 11:31:01 glass kernel: scsi : aborting command due to timeout : pid 188244, scsi0, channel 0, id 0, lun 0 Read (6) 06 02 a1 02 00  
Dec  3 11:31:02 glass kernel: SCSI host 0 channel 0 reset (pid 188131) timed out - trying harder 
Dec  3 11:31:02 glass kernel: SCSI bus is being reset for host 0 channel 0. 
Dec  3 11:31:03 glass kernel: SCSI host 0 reset (pid 188131) timed out again - 
Dec  3 11:31:03 glass kernel: probably an unrecoverable SCSI bus or device hang. 
Dec  3 11:31:04 glass kernel: (scsi0:0:0:0) Synchronous at 80.0 Mbyte/sec, offset 15. 
Dec  3 11:31:04 glass kernel: (scsi0:0:1:0) Successfully completed Domain validation. 
Dec  3 11:31:04 glass kernel: (scsi0:0:3:0) Synchronous at 80.0 Mbyte/sec, offset 15. 
Dec  3 11:31:04 glass kernel: (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 15. 
Dec  3 11:31:04 glass kernel: (scsi0:0:1:0) Performing Domain validation. 
Dec  3 11:31:04 glass kernel: (scsi0:0:1:0) Successfully completed Domain validation. 

All the processes that were accessing if SCSI discs are hung in a D
wait (WCHAN "wait_on_buffer" or "wait_on_page" or "down" or the like).

The machine isn't completely hung as I can still login, and a reboot
sometimes succeeds, after successfully unmounting the filesystems
(though sometimes is hangs while "unmounting filesystems").

It looks like a failed request is getting lost and never completed,
either successfully or otherwise.

If anyone has any suggestion - a fix maybe, or some suggestions where
to look or how to instrument my kernel to get more details - I would
appreciate it.

Incase it is relevant, I have a Dual PentiumII-350, Adaptech 2940-U2W
host adapter, 3 18Gb Seagate LVD drives.
The drives have ext2 filesystems and are being accessed by knfsd.

NeilBrown

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post