[3038] in linux-scsi channel archive
Re: new scsi code
daemon@ATHENA.MIT.EDU (Edward Welbon)
Mon Jan 5 21:24:17 1998
Date: Mon, 5 Jan 1998 19:56:53 -0600 (CST)
From: Edward Welbon <welbon@bga.com>
Reply-To: welbon@bga.com
To: Eric Youngdale <eric@andante.jic.com>
cc: Gerd Knorr <kraxel@goldbach.isdn.cs.tu-berlin.de>,
linux-scsi@vger.rutgers.edu, Andrew Sapozhnikov <sapa@hq.icb.chel.su>
In-Reply-To: <Pine.LNX.3.95.980105080903.412O-100000@andante.jic.com>
On Mon, 5 Jan 1998, Eric Youngdale wrote:
> *****************
> To: Ed Welbon - welbon@bga.com
>
> I am going to have to get more information from you. If you set
> the 1542 to use the old error handling (I believe you may need to add a
> line in scsi_syms.c to export scsi_sleep()). From your mail message, it
> sounds like something is going wrong during the bus scan phase when the
> 1542 is compiled into the kernel. If your root filesystem isn't on the
> 1542, then you can use the kernel command line option:
>
> scsi_logging=1
I'll try this later tonite. I wasn't very clear or complete in describing
my problem. Here are the horrible details. I mount my root partition on a
raid. This makes debugging harder (though I am willing). So here is the
background.
The raid is nine disks with three disks each on three ncr53c875 cards.
The actual boot disk (i.e. lilo.conf has boot=/dev/sda) is attached to the
aha1542. To mount the raid0, I boot from the 1542 using INITRD, the 1542
card firmware happily loads the kernel and initial ramdisk filesystem
(i.e. lilo.conf has: image=/boot/zImage and initrd=/boot/zImage.gz). Once
the kernel is up and has mounted the initial ramdisk (whose content is in
zImage.gz) it attempts to start the raid and switch root to the raid (in
the usual way with linuxrc).
If aha1542 is built as a module, linuxrc attempts to insmod aha1542 but
aha1542 fails to load. I don't recall the specific failure but because it
doesn't load the module and do the scan of its own bus, it does not find
its disk (the boot media) and hence does not label that disk the the next
disk that is supposed to be /dev/sdb eventually shows up as /dev/sda and
this screws me up something awful becuase the raid is started with
/dev/sdb. If I renumber the raid devices to start with /dev/sda, then
the raid will come up and the boot can proceed.
If I build aha1542 into the kernel, then it seems to find the devices on
its own scsi bus OK but when nc553c8xx loads, nc553c8xx only finds the
devices /dev/sdb /dev/sdc /dev/sdf /dev/sdj and does not find /dev/sdd
/dev/sde /dev/sdg /dev/sdh /dev/sdi. The mdadd fails and no boot 8-(.
If both aha1542 and nc553c8xx are built into the kernel, the result is
pretty much the same, md_add() finds that devices /dev/sdd /dev/sde
/dev/sdg /dev/sdh /dev/sdi have zero size and bails out.
I tried modifying the original aha1542.h file at the AHA1542
initilization by setting use_new_eh_code to zero. This did not work.
So I brought in the 2.1.74 version of aha1542.c and aha1542.h
and after suitable modifications to the older aha1542.h, I am up with
2.1.77.
So in debugging this, one problem that I have is that I can't boot at all
when 1542 is broken (the filesystem on the single disk attached to the
1542 is not really bootable). I can fix this though I don't know if I can
get to that point tonite. Another possibility is I can get linuxrc to
give me a shell prompt just prior to the load of the ncr53c8xx. At that
point I could mount /proc and if I can put the console to serial I could
cut and paste etc. so maybe I can get more details that way.
It is not your concern, but when I ran into the problem with 2.1.75, I
happened to have an error in the /dev on the ramdisk initial filesystem
such that linuxrc would run but was not able to access stdout or invoke a
shell (that makes inird setups really hard to debug). It nearly drove me
crazy. Why is it that when a bug bites you it is always in conjuction
with other bugs to greatly confuse things?
8-)
Ed Welbon - welbon@bga.com