[3041] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

RE: Dual-hosting SCSI and failure modes (was Re: RAID & unhappy

daemon@ATHENA.MIT.EDU (Doug Ledford)
Tue Jan 6 01:18:10 1998

In-Reply-To: <Pine.LNX.3.96.980105225109.16349F-100000@roadrunner.realbig.com>
Date: 	Tue, 06 Jan 1998 00:05:48 -0600 (CST)
From: Doug Ledford <dledford@dialnet.net>
To: Andy Poling <andy@globalauctions.com>
Cc: linux-scsi@vger.rutgers.edu, linux-raid@vger.rutgers.edu, linas@linas.org


On 06-Jan-98 Andy Poling wrote:
>On Mon, 5 Jan 1998 linas@linas.org wrote:
>> Well,  actually ... the reason I want with s/w raid is that I was 
>> hoping to build a dual-cpu system, with one scsi buss attached to
>> two servers.  I haven't yet gotten anywhere with this.
>> 
>> While I have your ear, a couple o quickie questions:
>> 
>> -- Having two cpu's accessing the bus at the same time should be OK, 
>>    as long as they are not accessing the same partitions, right?
>>    That is, the only reason to avoid dual access is to not
>>    mangle a file system, right?
>
>The drivers and kernel code (mostly) appear to be written under the
>assuption that they are the only active host controller on the bus.  They
>do
>things like reset the bus when things are going poorly, etc. that might be
>considered un-neighborly behavior if there were more than one system and
>host controller present...

The Adaptec aic7xxx driver doesn't have any problem with this.  We will
currently share a bus with another controller, including things such as
keeping track of when bus resets occur, regardless of who does them, and
performing the appropriate action.  Actually, the spec is fairly well
defined on how to operate here and it isn't to hard to implement, but the
card has to be able to signal the kernel that someone other than the card
tripped the reset pin.

>On the other hand, electronically controlled SCSI bus switches are
>available
>so that the two host controllers need never actually meet on the bus.

Well, if you use that, then that nifty target mode support couldn't work :)

>If you ask me, a bigger problem for recovery after a failed drive is the
>current device naming scheme that could potentially result in *all* of the
>remaining disks having different names after one is no longer available
>(presently hot swapped out, powered down or otherwise totally
>unresponsive).

That's not a problem at all in the scenario we were discussing, aka, you
never want the machine rebooted.  It should stay running on these failures. 
Well, if you stay running, then you don't rescan the bus, drive letters
don't change, the only thing different is the RAID code would have to end up
disabling one of the devices until a new one was on-line and could be
re-built.  Presumably, the RAID1 and RAID5 code might benefit from the
ability to sense the dead drive, stop all commands to that drive, then allow
the user to signle that the drive has been replaced, have the SCSI bus
re-identify the new drive, then add the new drive into the array and start
the rebuild all without rebooting (I have no clue as to whether this can be
done or not with the current RAID code).

>Related item for the "it would make MD RAID truly awesome" wish list: scan
>all present disks for the MD RAID headers (trailers?) and automatically
>construct the correct device list to make up each MD device.  Other
>proprietry logical volume support (SGI XLV) does this, and while it's a
>little disconcerting the first time you see it in action, it's very nice to
>have when the chips are down.

Yep, I could actually do away with that damn initrd I use to set up my root
RAID0 array :)


----------------------------------
E-Mail: Doug Ledford <dledford@dialnet.net>
Date: 06-Jan-98
Time: 00:05:49
----------------------------------

home help back first fref pref prev next nref lref last post