[1854] in linux-scsi channel archive
Re: Latest aicxxx driver (fwd)
daemon@ATHENA.MIT.EDU (Doug Ledford)
Mon May 12 12:47:17 1997
To: yuri mironoff <yuri@buster.rgti.com>
cc: aic7xxx@FreeBSD.ORG, linux-scsi@vger.rutgers.edu
In-reply-to: Your message of "Mon, 12 May 1997 11:21:47 EDT."
<Pine.A41.3.95.970512111830.111540B-100000@bdmg30.rgti.com>
Date: Mon, 12 May 1997 11:43:25 -0500
From: Doug Ledford <dledford@dialnet.net>
--------
> Sorry - forgot configuration:
>
> Linux 2.0.30 on a Dual PPro
> 2940UW + IBM DORS
> May 8 driver
>
> > Hi!
> >
> > Got the latest driver and rushed home to enable SCB paging. The machine
> > locked solid running bonnie. :(((
> >
> > Regards,
> >
> > Y.
>
Dan, I think we might have a generic problem with the latest SMP stuff. I
assume this guy is running SMP kernels, and I know Richard Johnson was having
problems (very bad ones) under the latest 2.1.x stuff with SMP kernels and the
aic7xxx driver that is stock in the kernel and the May8 driver. I know the
latest 2.1.xx kernels have broken the mid level scsi code's protection against
re-entrance into the driver, and it might be broken under 2.0.30 as well. So,
knowing that we possibly have a generic SMP problem, we probably need to do
the following two things:
Like we have a flag for p->flags & IN_ISR, we are going to need to make a
similar flag for IN_QUEUE (I think the latest kernels and possibly 2.0.30 as
well on SMP machines have broken the once believed idea that our queue routine
would be non-re-entrant excepting if we get an interrupt during our only
allowed queue thread). The other thing is we probably do need to do away with
the SA_INTERRUPT flag on request_irq() and handle our own disabling of
interrupts. Evidently there is some special magic being performed in the new
2.1.x kernels related to cli() usage on SMP machines that we may be missing by
not handling our own interrupt on/off scenario. The same applies to our queue
routine, the cli() now does magical stuff beyond just stopping interrupts, it
also handles making sure only one processor is trying to handle any given
interrupt to avoid having multiple processors working on the same interrupt.
In any case, I think this is what needs to be added. First, the new flag to
the p->flags structure. Then, at the beginning on the aic7xxx_queue() routine
we need something like the following:
save_flags(flags);
cli();
while((p->flags & IN_QUEUE) && !(p->flags & IN_ISR))
{
sti();
schedule();
cli();
}
if (!(p->flags & IN_ISR))
{
p->flags |= IN_QUEUE;
}
restore_flags();
and at the end:
if (!(p->flags & IN_ISR))
{
p->flags &= ~IN_QUEUE;
}
Here, we don't want to wipe out the IN_QUEUE flag if we interrupted the queue
routine with an interrupt. We also don't want to set the flag if we are
coming from an isr since we don't know if we interrupted one or not.
This little snippit of code will do one very important thing for us, it
enforces single threaded operation of the queue routine with the exception of
being called during the interrupt handler, then it will ignore the single
thread requirement (but we do that already, we just have the critical sections
inside of cli() areas, or so we thought, but there is a race condition on SMP
boxes for aic7xxx_alloc_scb() since two processors could end up trying to
alloc the same scb twice, causing much confusion, etc.)
An alternative to single threading the entire queue routine would probably be
to single thread just the alloc_scb() routine and the waiting_queue
manipulation functions. I don't think buildscb() or any of the rest of the
code should have problems since they all more or less work on local variables
built from and around the alloced scb and the scsi cmd pointer which would be
different on the two running instances, but I'm no smp expert so someone else
should correct me if I'm wrong here :)
Anyway, that's my current guess as to the smp problems I've seen, but I don't
have an SMP box to test this out on :( Maybe I can talk the boss.... :)
To be quite honest, I think there could be some further optimizations that
utilize spin locks of some sort, or return locks, one of the two, to optimize
the run_waiting_queues function and the queue function in order to reduce the
amount of time that we run with interrupts off and increase multi-thread
capability so as to get the most performance out of the cards, but I would
have to actually sit down and think about it for a while as it would require a
few new global static variables most likely in order to enforce single
threaded operation where it is critical plus some consideration of possible
scenarios to make sure we catch everything. The possible solution I posted
above is rather innefficient, but should work. I would prefer to come up with
something more elegant though.
--
*****************************************************************************
* Doug Ledford * Unix, Novell, Dos, Windows 3.x, *
* dledford@dialnet.net 873-DIAL * WfW, Windows 95 & NT Technician *
* PPP access $14.95/month *****************************************
* Springfield, MO and surrounding * Usenet news, e-mail and shell account.*
* communities. Sign-up online at * Web page creation and hosting, other *
* 873-9000 V.34 * services available, call for info. *
*****************************************************************************