[2164] in linux-scsi channel archive
Driver bug ?
daemon@ATHENA.MIT.EDU (m.j.s.vandoesburg@student.utwente.)
Wed Jul 16 23:05:57 1997
From: m.j.s.vandoesburg@student.utwente.nl
Date: Tue, 15 Jul 1997 11:10:18 +0200
To: linux-scsi@vger.rutgers.edu
System: Linux 2.0.30 (also 1.2.13 and 2.0.28)
(Linux and BSD NCR810 driver)
adapter: NCR810 (active termination) (Symbios 53C810 chip)
Harddisk: HP C3325A Rev: 6066
Termination: OK
Cable: 50cm and OK
Other devices on SCSI bus: none
Problem: My harddisk crashes when running a (faulty) program.
When using a program of which I know has got a bug in it (it creates a
doubly linked list and uses the pointer of an object after calling free)
my harddisk crashes. (100% of the time) The first time it happened was
using kernel 1.2.13 (not using the program I currently use to crash my
system) but then I was not able to reproduce the problem. Currently I
am.
What happens ?
The driver tries to reset the drive. Sometimes this succeeds and the computer
continues. Most of the time the drive just locks up and I have to cut the
power supply to get started again (reset button will reset the computer but
it will hang when the NCR810 tries to detect the HD)
The things I've tried:
1. Changed the system memory
2. Tried another NCR810 controller, this one used passive termination.
(also a SYMBIOS 53C810)
3. Tried another video adapter (I just got a new one just before
the problem occured on a daily basis)
4. Putting the driver in FAST SCSI2, Sync mode etc.. and all combinations
5. Put the drive in SCSI (not SCSI2) mode (changed jumper on drive)
6. Used the BSD instead of the Linux driver.
7. Pulled the power plug out of the drive and reinserted it. (very
hard reset of the drive)
8. Tried to trace the bug by enabling the debug messages but when I do
this the system won't crash. (I could fix the problem by always
producing debug messages and sending them to /dev/null :-)
9. Changed the SCSI ID of the drive (and didn't expect it to work)
Any help would be ... helpfull :-)
-------------------------- /var/adm/syslog (logged on remote computer)
The return code varies, I've seen a0 and 93 (nothing else)
SCSI disk error : host 0 channel 0 id 0 lun 0 return code = a0
The sector at which the error occurs varies (any value)
The driver did not complain about the revision with the other NCR810
controller. (which also crashed)
Jul 7 21:10:48 flipje kernel: scsi-ncr53c7,8xx : at PCI bus 0, device 19, function 0
Jul 7 21:10:48 flipje kernel: scsi-ncr53c7,8xx : warning : revision of 18 is greater than 2.
Jul 7 21:10:48 flipje kernel: scsi-ncr53c7,8xx : NCR53c810 at memory 0xe1000000, io 0x6000, irq 11
Jul 7 21:10:48 flipje kernel: scsi0 : burst length 8
Jul 7 21:10:48 flipje kernel: scsi0 : reset ccf to 3 from 0
Jul 7 21:10:48 flipje kernel: scsi0 : NCR code relocated to 0x26c600 (virt 0x0026c600)
Jul 7 21:10:48 flipje kernel: scsi0 : test 1 started
Jul 7 21:10:48 flipje kernel: scsi0 : NCR53c{7,8}xx (rel 17)
Jul 7 21:10:48 flipje kernel: scsi : 1 host.
Jul 7 21:10:48 flipje kernel: scsi0 : target 0 accepting asynchronous SCSI
Jul 7 21:10:48 flipje kernel: scsi0 : setting target 0 to asynchronous SCSI
Jul 7 21:10:48 flipje kernel: Vendor: HP Model: C3325A Rev: 6066
Jul 7 21:10:49 flipje kernel: Type: Direct-Access ANSI SCSI revision: 02
Jul 7 21:10:49 flipje kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Jul 7 21:10:49 flipje kernel: scsi : detected 1 SCSI disk total.
Jul 7 21:10:49 flipje kernel: SCSI device sda: hdwr sector= 512 bytes. Sectors= 4238836 [2069 MB] [2.1 GB]
Jul 7 21:10:49 flipje kernel: Partition check:
Jul 7 21:10:49 flipje kernel: sda: sda1 sda2 sda3
.
.
.
Jul 7 21:12:46 flipje kernel: Internal error scsi.c 1695
Jul 7 21:12:47 flipje kernel: status byte = 16
Jul 7 21:12:47 flipje kernel: SCSI disk error : host 0 channel 0 id 0 lun 0 return code = a0
Jul 7 21:12:47 flipje kernel: scsidisk I/O error: dev 08:01, sector 1682522
Jul 7 21:13:01 flipje kernel: scsi : aborting command due to timeout : pid 14475, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac a9 10 00
Jul 7 21:13:01 flipje kernel: scsi0 : DANGER : command running, can not abort.
Jul 7 21:13:01 flipje kernel: scsi : aborting command due to timeout : pid 14476, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac b9 10 00
Jul 7 21:13:02 flipje kernel: scsi0 : found command 14476 in Linux issue queue
Jul 7 21:13:02 flipje kernel: scsi : aborting command due to timeout : pid 14477, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac 9b 0e 00
Jul 7 21:13:02 flipje kernel: scsi0 : found command 14477 in Linux issue queue
Jul 7 21:13:20 flipje kernel: scsi : aborting command due to timeout : pid 14475, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac a9 10 00
Jul 7 21:13:20 flipje kernel: scsi0 : DANGER : command running, can not abort.
Jul 7 21:13:21 flipje kernel: SCSI host 0 abort (pid 14475) timed out - resetting
Jul 7 21:13:21 flipje kernel: SCSI bus is being reset for host 0 channel 0.
Jul 7 21:13:21 flipje kernel: scsi0 : DCMD|DBC=0x820b0000, DNAD=0x26c85c (virt 0x0026c85c)
Jul 7 21:13:21 flipje kernel: DSA=0x941e8 (virt 0x000941e8)
Jul 7 21:13:21 flipje kernel: DSPS=0x26c84c, TEMP=0x940ac (virt 0x000940ac), DMODE=0x88
Jul 7 21:13:21 flipje kernel: SXFER=0x0, SCNTL3=0x3
Jul 7 21:13:21 flipje kernel: BSY phase=CMDOUT, 0 bytes in SCSI FIFO
Jul 7 21:13:21 flipje kernel: STEST0=0x47
Jul 7 21:13:21 flipje kernel: scsi0 : DSP 0x26c854 (virt 0x0026c854) ->
Jul 7 21:13:21 flipje kernel: 0x26c854 (virt 0x0026c854) : 0x820b0000 0x0026c84c (virt 0x0026c84c)
Jul 7 21:13:21 flipje kernel: 0x26c85c (virt 0x0026c85c) : 0x8f0b0000 0x0026ca04 (virt 0x0026ca04)
Jul 7 21:13:21 flipje kernel: 0x26c864 (virt 0x0026c864) : 0x9e0b0000 0x00000000 (virt 0x00000000)
Jul 7 21:13:21 flipje kernel: 0x26c86c (virt 0x0026c86c) : 0x800b0000 0x0026c88c (virt 0x0026c88c)
Jul 7 21:13:21 flipje kernel: 0x26c874 (virt 0x0026c874) : 0x810b0000 0x0026c8e4 (virt 0x0026c8e4)
Jul 7 21:13:21 flipje kernel: 0x26c87c (virt 0x0026c87c) : 0x830b0000 0x0026cc0c (virt 0x0026cc0c)
Jul 7 21:13:21 flipje kernel: scsi0 : connected (SDID=0x0, SSID=0x0)
Jul 7 21:13:21 flipje kernel: scsi0 : dsa at phys 0x941e8 (virt 0x000941e8)
Jul 7 21:13:21 flipje kernel: + 64 : dsa_msgout length = 1, data = 0x9402c (virt 0x0009402c)
Jul 7 21:13:21 flipje kernel: Identify disconnect not allowed lun 0
Jul 7 21:13:21 flipje kernel: + 60 : select_indirect = 0x3000000
Jul 7 21:13:21 flipje kernel: + 56 : dsa_cmnd = 0x4410 result = 0xffff, target = 0, lun = 0, cmd = Write (6) 19 ac a9 10 00
Jul 7 21:13:21 flipje kernel: + 48 : dsa_next = 0x0
Jul 7 21:13:21 flipje kernel: scsi0 target 0 : sxfer_sanity = 0x0, scntl3_sanity = 0x3
Jul 7 21:13:22 flipje kernel: script : 0x78030300 0x0 0x78050000 0x0 0x90080000 0x0 0x0 0x0
Jul 7 21:13:22 flipje kernel: scsi0 : saved data pointer at offset 0
Jul 7 21:13:22 flipje kernel: scsi0 : active data pointer at offset 0
Jul 7 21:13:22 flipje kernel: scsi0 : issue queue
Jul 7 21:13:22 flipje kernel: scsi0 : dsa at phys 0x92080 (virt 0x00092080)
Jul 7 21:13:22 flipje kernel: + 64 : dsa_msgout length = 2541028, data = 0x0 (virt 0x00000000)
Jul 7 21:13:22 flipje kernel: + 60 : select_indirect = 0xc0000004
Jul 7 21:13:22 flipje kernel: + 56 : dsa_cmnd = 0x0
Jul 7 21:13:22 flipje kernel: + 48 : dsa_next = 0x0
Jul 7 21:13:22 flipje kernel: scsi0 : dsa at phys 0x90080 (virt 0x00090080)
Jul 7 21:13:22 flipje kernel: + 64 : dsa_msgout length = 2541028, data = 0x0 (virt 0x00000000)
Jul 7 21:13:22 flipje kernel: + 60 : select_indirect = 0xc0000004
Jul 7 21:13:22 flipje kernel: + 56 : dsa_cmnd = 0x0
Jul 7 21:13:22 flipje kernel: + 48 : dsa_next = 0x0
Jul 7 21:13:22 flipje kernel: scsi0 : schedule dsa array :
Jul 7 21:13:22 flipje kernel: scsi0 : end schedule dsa array
Jul 7 21:13:22 flipje kernel: scsi0 : reconnect_dsa_head :
Jul 7 21:13:22 flipje kernel: scsi0 : end reconnect_dsa_head
Jul 7 21:13:22 flipje kernel: The sti() implicit in a printk() prevents hangs
Jul 7 21:13:22 flipje kernel: scsi : aborting command due to timeout : pid 14476, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac b9 10 00
Jul 7 21:13:35 flipje kernel: scsi0 : did this command ever run?
Jul 7 21:13:35 flipje kernel: scsi : aborting command due to timeout : pid 14477, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac 9b 0e 00
Jul 7 21:13:36 flipje kernel: scsi0 : did this command ever run?
Jul 7 21:13:36 flipje kernel: scsi : aborting command due to timeout : pid 14475, scsi0, channel 0, id 0, lun 0 Write (6) 19 ac a9 10 00
Jul 7 21:13:36 flipje kernel: scsi0 : DANGER : command running, can not abort.
Jul 7 21:13:36 flipje kernel: SCSI host 0 abort (pid 14475) timed out - resetting
Jul 7 21:13:36 flipje kernel: SCSI bus is being reset for host 0 channel 0.
It will keep doing this until the power is cut.