[6160] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Linux SCSI mystery (long war story)

daemon@ATHENA.MIT.EDU (De Clarke)
Wed Mar 24 01:16:34 1999

From: De Clarke <de@ucolick.org>
Date: 	Tue, 23 Mar 1999 22:05:12 -0800 (PST)
To: linux-scsi@vger.rutgers.edu


Once upon a time :-) there was a happy Linux system
that was purchased preconfigured from PromoX.  It
had a scsi cdrom and boot disk on an NCR 53c825 PCI
scsi controller.  It ran troublefree for 2.5 years
under various versions of Red Hat Linux.  It was
never polluted by any form of DOS or WinDoze.
The motherboard was an Intel Advanced/EV (Baby AT)
with a P133.  It stayed up for months at a time
and everything just worked.  those were the days :-)

Because everything just worked for so long, even
new cards being plugged in like a modem and a MIDI
interface, I never became expert in nasty dirty 
things like Intel bus configuration and BIOS.
So, though I'm a long time Unix and Linux hacker,
I'm quite a newbie when it comes to Intel hardware.

		------------

Then on one ill-fated day I decided the old box was
kinda slow and I wanted to upgrade it.  The first
upgrade was to add 64 MB of memory.  That worked
great.  Then a couple of IDE disks.  Those worked
great.  Then, gee, what about those Evergreen CPU
upgrades -- you can put a 333MHz processor in a
Socket 5 like the one on my board, sounds good...
The other upgrades were so painless, what the heck.
Give it a try.

Well, the Evergreen Spectra CPU upgrade required
a different BIOS.  MRBIOS in fact.  I exchanged 
several mail messages with tech support at Evergreen,
since all their doco assumes that you run WinDoze.
Eventually we got the right BIOS onto their flash
diskette and I flashed it, first (of course!) backing
up my native AMI BIOS.

I rebooted (but didn't power cycle) and Linux came
up bright and cheery.  Everything still worked.  All
the peripherals behaved.  So the next step was to
install the Evergreen CPU, heat sink, fan, etc.
Did that, it was easy, rebooted, and got as far as

	LI_ <hang>

and everything stopped.  OK, the CPU doesn't work.
No problem, I can return it.  Put the old CPU back
in and reboot.

	LI_ <hang>

Sinking feeling.  Well you can imagine the rest of
that evening :-)  as every attempt to get booted 
fails.  Not only that, the symptoms get worse:
after a few power cycles the LI_ hang behaviour
changes to 

	L 01 01 01 01 01 01...

Because of certain peculiarities of the flash
diskette from Evergreen it is 2 days and a lot of
heated tech support exchanges before I get my
original AMI BIOS restored.  That should fix it.
Nope.

	L 01 01 01 01 01 01 01...

This is getting very bad.  Looks like this disk
is no longer a boot disk no matter which BIOS I
use.  So, OK, break out the Red Hat 5.1 distrib
and reinstall from scratch.  All my user files are
on the IDE drives now anyway, I have little to lose
by reformatting the boot disk.  Maybe I can just
start over.

		-------------

I boot the RH install diskette and get into the
installer.  I choose Local CDROM and the right
card and it says it can't locate the device.
When I escape to the session log screen I see

...
no SCSI devices available
in LoadDeviceDriver, ks=0, typNam=SCSI
PCI probing for SCSI devices
PCI probe found 1 SCSI devices
found driver for NCR 53C8xx PCI
running: /bin/insmod /bin/insmod /modules/ncr53c8xx.o
insmod failed!
/proc/scsi/scsi:  Attached devices none

no SCSI devices available
in LoadDeviceDriver, ks=0, typNam=SCSI
PCI probing for SCSI devices
PCI probe found 1 SCSI devices
found driver for NCR 53C8xx PCI
running: /bin/insmod /bin/insmod /modules/ncr53c8xx.o
insmod failed!
picked driver NCR 53C8xx PCI
running: /bin/insmod /bin/insmod /modules/ncr53c8xx.o
insmod failed!
picked driver Adaptec 152x
...

And I say to myself gaaah!  what does it mean it
can't insmod the NCR driver?  that's a standard
driver.  And what the hell is it doing trying to use
an Adaptec driver with my NCR card?  So I try a RH
5.2 install diskette and it does basically the same
thing.  The sinking feeling is now very intense.

		----------------------

So I take the SCSI disk to work and hang it on
a working scsi bus there (Adaptect ctrlr).  I take
a look at the partition table and it looks terminally
weird:

Disk /dev/sdc: 255 heads, 63 sectors, 263 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start      End   Blocks   Id  System

/dev/sdc1   *         1        9    66496+  83  Linux native
Partition 1 has different physical/logical endings:
     phys=(31, 65, 63) logical=(8, 71, 63)
Partition 1 does not end on cylinder boundary:
     phys=(31, 65, 63) should be (31, 254, 63)

/dev/sdc2             9       26   135135   82  Linux swap
Partition 2 has different physical/logical beginnings (non-Linux?):
     phys=(32, 0, 1) logical=(8, 72, 1)
Partition 2 has different physical/logical endings:
     phys=(96, 65, 63) logical=(25, 26, 63)
Partition 2 does not end on cylinder boundary:
     phys=(96, 65, 63) should be (96, 254, 63)

/dev/sdc3            26       91   525987   83  Linux native
Partition 3 has different physical/logical beginnings (non-Linux?):
     phys=(97, 0, 1) logical=(25, 27, 1)
Partition 3 has different physical/logical endings:
     phys=(349, 65, 63) logical=(90, 149, 63)
Partition 3 does not end on cylinder boundary:
     phys=(349, 65, 63) should be (349, 254, 63)

/dev/sdc4            91      263  1384614   83  Linux native
Partition 4 has different physical/logical beginnings (non-Linux?):
     phys=(350, 0, 1) logical=(90, 150, 1)
Partition 4 has different physical/logical endings:
     phys=(1015, 65, 63) logical=(262, 245, 63)
Partition 4 does not end on cylinder boundary:
     phys=(1015, 65, 63) should be (1015, 254, 63)

I don't like the look of that.  I plug the scsi card from
my home system into the PCI bus of this working machine,
and connect a CDROM to it (I figure you can't hurt a 
CDROM much).  It is not visible;  the PCI card's individual 
BIOS comes up, but the OS never sees the controller.
I find out that this kernel looks like it's configured
without the NCR driver.  So I try an insmod on what
looks like the NCR modular driver and that fails, says
it's not a valid module.

So how am I gonna figure out if my card still works?

I also take a look around the doco for driver modules
and I read scsi.txt in /usr/src/linux/Documentation:

-------------------------------------------------------------
        The lower level drivers are the ones that support the
individual cards that are supported for the hardware platform that you
are running under.  Examples are aha1542.o to drive Adaptec 1542
cards.  Rather than list the drivers which *can* be modularized, it is
easier to list the ones which cannot, since the list only contains a
few entries.  The drivers which have NOT been modularized are:

        NCR5380 boards of one kind or another including PAS16,
                Trantor T128/128F/228, 
-------------------------------------------------------------


So now I am completely hosed and completely confused.
And just full of questions.

1) is a change of BIOS really so dangerous that it can
   destroy the bootability of a disk permanently?  how
   come things didn't return to normal when I restored
   the original BIOS?

2) how come my disk partition table is so trashed?  can
   a mere BIOS actually write on my boot disk and trash
   it?  this disk has been partitioned the same for
   2.5 years and has never given a moment's trouble,
   how come all of a sudden it thinks all its partition
   start/end positions are wrong?

3) reading scsi.txt you would think that the NCR driver
   is not modularized (or is NCR5380 different from
   NCR53c8xx?)... why does the RH install diskette
   try to insmod it?  How come the scsi.txt file seems
   incomplete, as though it had been truncated after the
   first item in a list?

4) if the NCR driver *is* modularized then why does the
   RH diskette FAIL to insmod it?  what does this failure
   mean?

5) how the hell am I ever gonna get this system back on
   its feet?

I am hoping that you folks who are way more knowledgeable
than I about SCSI hardware will be able to give me some
clue as to what really happened when all hell broke loose
and my system became unbootable -- and even better, some
clue about how to recover.  I've been down now for a
week and a half.  Evergreen tech support is useless.
I don't have RedHat tech support because it has been more
than 90 days since I bought my last CDROM kit [grrr].

I've posted to comp.os.linux.setup but have got no useful
answers;  have joined and posted to, then unsubbed from 
redhat-list and redhat-install-list (no useful answers).
I've read more FAQs and old postings than I can remember
but no one seems to offer any really relevant info.

So, Obi-wan Kenobi, you're my only hope.  Please tell this
horror-struck hardware neophyte what I did that destroyed
my beloved Linux crate.

Sorry this is so long but I wanted to provide accurate
details, not just "it broke".

de



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.rutgers.edu

home help back first fref pref prev next nref lref last post