[1285] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Problems with MO-Drive

daemon@ATHENA.MIT.EDU (Michael Ulbrich)
Mon Jan 20 17:10:58 1997

Date: 	Mon, 20 Jan 97 22:40:19 +0100
From: mul@lab1033.berlin.ptb.de (Michael Ulbrich)
To: linux-scsi@vger.rutgers.edu
CC: mul@lab1033.berlin.ptb.de

Hi folks,

I'm running 1.2.13 with two scsi hosts adapters: AHA 1542B and NCR53C815.
Booting is done from the AHA; the NCR driver(s) are loaded as modules if
I need the second controller. As NCR drivers I checked ncr53c7,8xx.o and
ncr53c8xx.o (ncrBsd2Linux-1.12a, 1.14c and 1.16e)

Problem description:

I'm working with 5,25" MODs in two formats: one format is handled by a
Pioneer drive (proprietary) the other by a Pinnacle (ISO: 600MB / 1.2GB).
The unique thing with these MODs is that they seem to be written in a non
contiguous manner: between areas with written sectors are areas where the
medium seems to be blank.

Accessing these (blank?) sectors with the MO-Drive connected
to the AHA leads to a Read Error with additional Information: Medium Error
After this error message appears in the syslog the Adaptec driver returns
and I can try to read the next sector.

The NCR driver(s) behaves differently (buggy ?):
Trying to read the blank sectors leaves the scsi bus hanging. The syslog
messages show that the driver tries to restart (reset scsi bus) for the
"second half of retries" and that's it. I have to reboot the machine
because rmmod of the driver does not work (busy).

This is the log of the failing READ (full debugging in low- and midlevel
drivers):

<6>SMalloc: 512 2140c0 scsi_do_cmd (host = 1, target = 4, buffer =002140c0,
        bufflen = 1024, done = 001a454c, timeout = 600, retries = 5)
<6>command : 08  00  00  04  02  00  1a  00  e0  ec  
<6>internal_cmnd (host = 1, target = 4, command = 00757a04, buffer =  002140c0, 
<6>bufflen = 1024, done = 001a454c)
<6>queuecommand : routine at 0303ea74
<6>ncr53c815-0-<4,0>: CMD=8 ncr53c815-0: queuepos=1 tryoffset=640.
<6>leaving internal_cmnd()
<6>Leaving scsi_do_cmd()

<snip: scsi0 log>

<6>[<2|0:c0|27dae8:19000002>P13 RL=2 D=0 SS0=1 
<6>CP=0027a018 CP2=0027a018 DSP=27dae8 NXT=27dae8 VDSP=0027dae0 CMD=19 OCMD=19
<6>TBLP=0027a08c OLEN=10 OADR=757a68
<6>ncr53c815-0-<4,0>: newcmd[0] 9000002 757a76 80080000 27dae8.
<6>]
<6>[F CCB=18 STAT=4/80
<6>
<6>ncr53c815-0: sense data: 70 0 3 0 0 0 0 6 0 0 0 0 11 1.
<6>Non-zero result in scsi_done 2 4:0
<6>In scsi_done(host = 1, result = 000002)
<6>scsi1 : extra data not valid Current error 820: sense key Medium Error
<6>Additional sense indicates Read retries exhausted
<6>
<6>In MAYREDO, allowing 5 retries, have 0
<6>internal_cmnd (host = 1, target = 4, command = 00757a04, buffer =  002140c0, 
<6>bufflen = 1024, done = 001a454c)
<6>queuecommand : routine at 0303ea74
<6>ncr53c815-0-<4,0>: CMD=8 ncr53c815-0: queuepos=2 tryoffset=20.
<6>leaving internal_cmnd()
<6>]

<snip: scsi0 log>

<6>[<2|0:c0|27dae8:19000002>P13 RL=2 D=0 SS0=1 
<6>CP=0027a018 CP2=0027a018 DSP=27dae8 NXT=27dae8 VDSP=0027dae0 CMD=19 OCMD=19
<6>TBLP=0027a08c OLEN=10 OADR=757a68
<6>ncr53c815-0-<4,0>: newcmd[0] 9000002 757a76 80080000 27dae8.
<6>]
<6>[F CCB=18 STAT=4/80
<6>
<6>ncr53c815-0: sense data: 70 0 3 0 0 0 0 6 0 0 0 0 11 1.
<6>Non-zero result in scsi_done 2 4:0
<6>In scsi_done(host = 1, result = 000002)
<6>scsi1 : extra data not valid Current error 820: sense key Medium Error
<6>Additional sense indicates Read retries exhausted
<6>
<6>In MAYREDO, allowing 5 retries, have 1
<6>scsi1 : resetting for second half of retries.
<6>scsi: reset(1)
<6>Danger Will Robinson! - SCSI bus for host 1 is being reset.
<6>ncr53c815-0: restart (scsi bus reset).
<6>ncr53c815-0: final value of dmode/ctest4/ctest5 = 0xc0/0x00/0x00
<6>scsi reset function returned 2
<6>performing request sense
<6>]

And now it hangs forever ...

From what I understood so far the lowlevel driver performed a successful
reset of the SCSI-bus and now the midlevel driver should re-queue the
command for another retry but nothing happens. (For these insights I have
to thank Gerard Roudier <groudier@club-internet.fr>)

He wrote:
on Jan 20 21:07:05 1997:

> I have had a look in the reset routine oh the AHA driver.
> In my standard 1.2.13 distribution the "outb" that should reset something
> is #ifdefed 0. But the driver try to cancel the corresponding command.
> In your case, the command is obviously not in the MAILBOX of the AHA,
> so it will do nothing. Then it returns SCSI_RESET_PUNT.
>
> The conclusion is that the AHA driver just return SCSI_RESET_PUNT and 
> bogusly but fortunately does NOT reset anything.
>
> The NCR drivers actually reset the SCSI bus in the same situation.
>
> On SCSI_RESET_PUNT the scsi middle driver performs a request_sense() so,
> indirectly, it will restart the command or finish it later on scsi_done().
>
> The NCR drivers return SCSI_RESET_SUCCESS. In that situation it seems
> to me that the scsi middle driver does NOT requeue the command.
>
> I am not surprised at all that all went well with the AHA since it just
> retries 5 times the command without performing SCSI reset.
>
> My conclusion is that bugs are very probably in the scsi driver and in
> the AHA driver, and that the AHA driver is bogusly fortunate to not
> perform the unnecessary SCSI reset requested by the scsi middle-level
> driver.
>
> Gerard.

BTW, I tried the same with Slakware 3.1 - Kernel 2.0.0 and got exactly
the same effect, although taking a quick look it seemed that the scsi.c
code had been substantially rewritten.

Any clues? ... Michael U.

home help back first fref pref prev next nref lref last post