[4695] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Panic in scsi.c ( and a fix)

daemon@ATHENA.MIT.EDU (Ishikawa)
Tue Sep 15 14:59:02 1998

Date: 	Wed, 16 Sep 1998 02:28:45 +0900
From: Ishikawa <ishikawa@yk.rim.or.jp>
Reply-To: ishikawa@yk.rim.or.jp
To: Richard Waltham <dormouse@farsrobt.demon.co.uk>, garloff@kg1.ping.de,
        linux-kernel@vger.rutgers.edu, linux-scsi@vger.rutgers.edu,
        alan@cymru.net, trevor@jpj.net, info@lianelle.com.au

Hello,

I further tried to investigate the problem caused by

   dd if=/dev/scd2 of=/dev/null &
   dd if=/dev/scd3 of=/dev/null

where /dev/scd2, /dev/scd3 are Nakamich MBR multi-LUN SCSI
CD changer device.

(1) Now I did the testing  on the 2.0.36-pre9.

(2) I still got the problem (the kernel seemingly hung: the cause
is the tight-looping within allocate_device().).

    So 2.0.36-pre9 + previous patch doesn't seem to
    solve the problem for me.

(3) Reproduced with two PCs: DC390 and BT-930.

    I now tested this using Nakamichi MBR-7 CD changer on
    two PCs: one uses DC390 and the other uses BusLogic BT930
    SCSI host adaptors.
    The symptoms are the same.
    Both adaptors seem to allocate the data structure from the higher lun
    to the lower lun.: the loop in the allocate_device() shows the
    SCpnt is scanned from the higher lun to the lower lun.

    Is it possible that the bug is not cured by the previous
    patch when the adaptor allocates
    higher LUN first ?

A few things worth mentiong.

(4) THE SAME CHANGE NECESSARY(?): request_device() in scsi.c

    The fix for allocate_device() changes the checking logic slightly.
    It turns out that the request_device() somewhere above the
    allocate_device() in scsi.c has EXACTLY the same loop.
    Shouldn't allocate_device() be changed in a similar manner?

    I temporaily modified the allocate_device() routine in a similar
    manner. The mods don't seem to have adverse effect at all.
    (I mean I tested this for an hour so with "ls -lR" on mounted
     CDs in paralle, etc..)

    However, the mods didn't cure the problem shown by the 

    dd if=/dev/scd2 of=/dev/null &
    dd of=/dev/scd3 of=/dev/null

    resulting in the tight loop within allocate_device().

(5) PERFORMANCE BUG in scsi.c? 

    I inserted printk() statements in request_device() as well as
    in allocate_device().

    After checking the log for the last few days, I noticed this. 
    It seems that 
    no LUN devices such as simple disks seem to be handled
    by the more complex branch of the if statement of the while(SCpnt)
    loop in both allocate_device() and request_device().
    That is, both MBR-7 CD changer and my SCSI disks seem to
    generate requests that follows the same path in these routines
    and 
    print output from the printk() statement which I thought would be
    used only for the multi-lun devices such as MBR-7.

    I am not sure if this was intended or not.
    However, this may be a hidden performance penalty bug.
  
(6) I mentioned the strange repetition of the loop for the
    same LUN in the previous message about the problem.
    With the additional printk() in allocate_device() for
    printing out the lun when the "found=..." branch is taken and
    when target_busy was set, I noticed on the PC with BT-930 adaptor that
    lun was scanned from higher to lower and then
    found=... branch or target_busy... branch
    is taken on the initial loop on this LUN, and then
    two more loops seemed to be done on this LUN and then
    the lesser LUN scan begins.
    Eg. A digested excerpt from klogd output file.
        from the printk() statment in Waltham patch.
        (0, 6, 4)
        (0, 6, 4)
        (0, 6, 4)
        ...
	(0, 6, 2)
        (0, 6, 2)
        (0, 6, 2)
        (0, 6, 1)  <--- (found set here, for example.)
        (0, 6, 1)
        (0, 6, 1)
        (0, 6, 0)
        (0, 6, 0)
        (0, 6, 0)

    I still wonder why there are a multiple loops on the same LUN
    and for that matter, why it had to loop on the same LUN
    after found is  set or target_busy is set??? 
     
    I wonder if this is caused by the MBR-7 returning 2048 bytes/block and
    SCSI subsystem tries to handle this in multiple of 512 bytes/block???

(7) One more thing.
    
    I made sure that the single command such as
    dd if=/dev/scd3 of=/dev/null

    seems to work just fine.

    Just for testing, I used the SAME device in  two parallel
    commands as follows.

    dd if=/dev/scd3 of=/dev/null &
    dd if=/dev/scd3 of=/dev/null

    The same hung! (Tight loop in allocate_devic() resulting in
    many printk() output lines for luns.)


This is what I found so far in the last couple of days.
I wonder what is the difference between my setup and Richard Waltham's
setup that makes his patch not effective about this problem on my PC.
(The lun scan order?).

Again, please let me know if someone wants to debug this
problem. I would be happy to insert printk() statements to
collect information.
 

Chiaki Ishikawa
   
PS: Again, I only read linux-scsi mailing list...

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post