[1854] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Re: Latest aicxxx driver (fwd)

daemon@ATHENA.MIT.EDU (Doug Ledford)
Mon May 12 12:47:17 1997

To: yuri mironoff <yuri@buster.rgti.com>
cc: aic7xxx@FreeBSD.ORG, linux-scsi@vger.rutgers.edu
In-reply-to: Your message of "Mon, 12 May 1997 11:21:47 EDT."
             <Pine.A41.3.95.970512111830.111540B-100000@bdmg30.rgti.com> 
Date: 	Mon, 12 May 1997 11:43:25 -0500
From: Doug Ledford <dledford@dialnet.net>

--------
>    Sorry - forgot configuration:
> 
>    Linux 2.0.30 on a Dual PPro
>    2940UW + IBM DORS
>    May 8 driver
> 
> >   Hi!
> >
> >   Got the latest driver and rushed home to enable SCB paging. The machine
> > locked solid running bonnie. :(((
> >
> >					Regards,
> >
> >					Y.
> 

Dan, I think we might have a generic problem with the latest SMP stuff.  I 
assume this guy is running SMP kernels, and I know Richard Johnson was having 
problems (very bad ones) under the latest 2.1.x stuff with SMP kernels and the 
aic7xxx driver that is stock in the kernel and the May8 driver.  I know the 
latest 2.1.xx kernels have broken the mid level scsi code's protection against 
re-entrance into the driver, and it might be broken under 2.0.30 as well.  So, 
knowing that we possibly have a generic SMP problem, we probably need to do 
the following two things:

Like we have a flag for p->flags & IN_ISR, we are going to need to make a 
similar flag for IN_QUEUE (I think the latest kernels and possibly 2.0.30 as 
well on SMP machines have broken the once believed idea that our queue routine 
would be non-re-entrant excepting if we get an interrupt during our only 
allowed queue thread).  The other thing is we probably do need to do away with 
the SA_INTERRUPT flag on request_irq() and handle our own disabling of 
interrupts.  Evidently there is some special magic being performed in the new 
2.1.x kernels related to cli() usage on SMP machines that we may be missing by 
not handling our own interrupt on/off scenario.  The same applies to our queue 
routine, the cli() now does magical stuff beyond just stopping interrupts, it 
also handles making sure only one processor is trying to handle any given 
interrupt to avoid having multiple processors working on the same interrupt.

In any case, I think this is what needs to be added.  First, the new flag to 
the p->flags structure.  Then, at the beginning on the aic7xxx_queue() routine 
we need something like the following:

save_flags(flags);
cli();
while((p->flags & IN_QUEUE) && !(p->flags & IN_ISR))
{
	sti();
	schedule();
	cli();
}
if (!(p->flags & IN_ISR))
{
	p->flags |= IN_QUEUE;
}
restore_flags();


and at the end:
if (!(p->flags & IN_ISR))
{
	p->flags &= ~IN_QUEUE;
}

Here, we don't want to wipe out the IN_QUEUE flag if we interrupted the queue 
routine with an interrupt.  We also don't want to set the flag if we are 
coming from an isr since we don't know if we interrupted one or not.

This little snippit of code will do one very important thing for us, it 
enforces single threaded operation of the queue routine with the exception of 
being called during the interrupt handler, then it will ignore the single 
thread requirement (but we do that already, we just have the critical sections 
inside of cli() areas, or so we thought, but there is a race condition on SMP 
boxes for aic7xxx_alloc_scb() since two processors could end up trying to 
alloc the same scb twice, causing much confusion, etc.)

An alternative to single threading the entire queue routine would probably be 
to single thread just the alloc_scb() routine and the waiting_queue 
manipulation functions.  I don't think buildscb() or any of the rest of the 
code should have problems since they all more or less work on local variables 
built from and around the alloced scb and the scsi cmd pointer which would be 
different on the two running instances, but I'm no smp expert so someone else 
should correct me if I'm wrong here :)

Anyway, that's my current guess as to the smp problems I've seen, but I don't 
have an SMP box to test this out on :(    Maybe I can talk the boss.... :)

To be quite honest, I think there could be some further optimizations that 
utilize spin locks of some sort, or return locks, one of the two, to optimize 
the run_waiting_queues function and the queue function in order to reduce the 
amount of time that we run with interrupts off and increase multi-thread 
capability so as to get the most performance out of the cards, but I would 
have to actually sit down and think about it for a while as it would require a 
few new global static variables most likely in order to enforce single 
threaded operation where it is critical plus some consideration of possible 
scenarios to make sure we catch everything.  The possible solution I posted 
above is rather innefficient, but should work.  I would prefer to come up with 
something more elegant though.



-- 
*****************************************************************************
* Doug Ledford                      *   Unix, Novell, Dos, Windows 3.x,     *
* dledford@dialnet.net    873-DIAL  *     WfW, Windows 95 & NT Technician   *
*   PPP access $14.95/month         *****************************************
*   Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
*   communities.  Sign-up online at * Web page creation and hosting, other  *
*   873-9000 V.34                   * services available, call for info.    *
*****************************************************************************



home help back first fref pref prev next nref lref last post