[5491] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Kernel Panic (aha1542.c)

daemon@ATHENA.MIT.EDU (Guest section DW)
Sun Dec 27 16:22:36 1998

Date: 	Sun, 27 Dec 1998 22:13:47 +0100 (MET)
From: dwguest@win.tue.nl (Guest section DW)
To: linux-kernel@vger.rutgers.edu, linux-scsi@vger.rutgers.edu,
        plr@interlog.com

    From: Peter Rasmussen <plr@interlog.com>
    Subject: Kernel Panic (aha1542.c)

    During a badblocks scan of a Seagate 1239NS (200MB) SCSI harddisk sitting on
    an AHA1542B adapter I got a number of these:

    aha1542.c: Trying device reset for target 0

    And then finally this:

    Kernel panic: unable to find empty mailbox for aha1542.

    As the disk is getting old it may have had a number of bad blocks on it, but
    I would prefer if I didn't have to reset the whole system just because of that.

Yes. Roughly speaking this is the fault of scsi_error.c
that insists on either recovering from an error or otherwise
crashing the system. If you read some linux-kernel backlog
you'll notice lots of people with a bad CD or bad tape where
scsi_error.c decided to crash an otherwise fine functioning system.

The code goes like:
	scsi_error_handler() calls scsi_unjam_host()
		[already the name is misleading - nothing is wrong with the host (adapter)]
	scsi_unjam_host() finds a command that timed out, and then does
		scsi_try_to_abort_command() - which fails; then
		scsi_try_bus_device_reset() - which fails; then
		scsi_try_bus_reset() - which fails; then 
		scsi_try_host_reset() - which fails; then
		device->online = FALSE;

This especially hits aha1542 users, since the aha1542 driver is rotten
(it is fine as long no aborts or resets are tried, but worse than worthless
as soon as aborts/resets occur), so that the scsi_error routines cannot
trust the driver responses.


You can improve the stability of your system quite a lot
by #ifdef'ing out the scsi_try_bus_reset() and scsi_try_host_reset() parts
of the above. A device error is not an indication that the host adapter is
malfunctioning and the SCSI bus needs resetting.


You can also improve the stability of your system a lot
by rewriting aha1542.c. I did this last August for a friend,
and he is happy now, but I need to experiment a bit more
to find out what this Adaptec precisely does upon an abort,
before this new driver can be released to the public.
[In fact I just started looking at this again this evening.]
Unfortunately, Adaptec itself seems to be very unhelpful.


    Looking through the code in aha1542.c I found many comments about trouble with
    resetting the bus, devices and the host, so it looks like an area that might
    need some attention :-)

    As I have a[n older] version of the adapter and what looks like a good testbed
    for problems (an old disk with bad blocks) I would like to look into this. I do
    however, not have the jumper-settings sheet anymore (lost between 1992 and now)
    so I am not really aware of the settings. So if someone would let me know I 
    would appreciate it. Any other information about the adapter would be useful
    as well, so if you have that or pointers to where I can get it, please let me
    know.


There is some information. Eric Youngdale provided a file 1540USR.MAN
that is also available at http://ftp.digital.com/pub/micro/adaptec/ and
http://ftp.dei.uc.pt/pub/adaptec/ and http://ftp.rad.kumc.edu/share/dos/adaptec/.
The firmware is also available (but I don't know for what kind of micro it is).

    Any suggestions from any of you guys would probably be helpful, eg. "try this"
    or "don't do that". For a starter, I would like to know what all this about
    mailboxes is about, as it is all over the code in aha1542.c ?

Mailboxes are the means of communication between you and the AHA1540.
Read 1540USR.MAN.

In case you want to discuss details, mail me - aeb@cwi.nl - do not reply.

Andries

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post