[192] in linux-scsi channel archive
SCSI Reset Functions.
daemon@ATHENA.MIT.EDU (Leonard N. Zubkoff)
Mon May 22 04:17:10 1995
Date: Mon, 22 May 1995 00:30:51 -0700
From: "Leonard N. Zubkoff" <lnz@dandelion.com>
To: eric@aib.com
Cc: linux-scsi@vger.rutgers.edu, drew@boulder.openware.com
In-Reply-To: Eric Youngdale's message of Sun, 21 May 95 16:23 EDT <m0sDHWf-0009RgC@aib.com>
Date: Sun, 21 May 95 16:23 EDT
From: eric@aib.com (Eric Youngdale)
In theory, yes. In practice, they might not. The problem is
that UNIT_ATTENTION also comes with a change in media. I would
suggest that a flag be set in the Scsi_Device structure indicating a
reset. This could be set by the low level reset function. The upper
levels of device drivers would decide how to handle this, and when to
clear it.
I think that suggestion makes sense.
Yes, I agree. Another concern I have had is with cache
coherency for caching controllers and the cache on disk drives that do
write caching. If putting a cdrom in the drive that has been written
on with a sharpie marker corrupts the hard drive (same goes for a tape
with a bad spot on it), then I would rather not do it at all rather
than do it wrong. To put it another way, it needs to be carefully
thought out first.
Indeed. My present temporary strategy has been to implement and test three
options as well as I can (controlled by #ifdef's in the source): (1) performing
a Host Adapter Hard Reset which also causes SCSI Bus Reset, (2) Sending Bus
Device Reset to the single Target having problems, and (3) doing nothing and
hoping for the best (what the current driver does). I have chosen (1) as the
default, as I'm more concerned with keeping my system up and running than
leaving a failing CD-ROM still accessible. I can always reboot cleanly and
quickly as long as the basic disk I/O is working.
By the way, how safe or unsafe is enabling write caching? I've noticed that
scsi-config warns against this, but I have no real information on whether or
not enabling write caching is a good idea. What are the dangers of doing so?
Assuming no power failures or a working UPS, is there any real danger?
They might spin down - I am not sure. They should spin up again
if you need them, so this is not really an issue.
Further investigation shows this is indeed the case, but a fair number of
commands afterwards may result in a NOT_READY error. After a bus reset we will
need to handle waiting for the drive to become ready again.
In theory you should get better performance with this. Have you observed any improvement :-)?
I haven't really done any performance testing of the new driver yet, preferring
first to concentrate on robustness. However, I was also curious about this
result, as well as the baseline performance of my new driver compared to the
old one, so I've run a small benchmark to get some idea about how much this
helps.
For a disk intensive test benchmark I was looking for something that wasn't as
contrived as iozone etc., so I chose the following: I have a script which
starts with the raw gzipped tar files for X11R6 (xc and contrib) and XFree86
3.1.1 and unpacks and patches a full source tree. Approximately 67228KB of
gzipped files expand into 306606KB of source tree. Since I need to delete the
source tree between runs anyway, I'll present those numbers as well. The user
and system times were all within 1% of each other, so I'll only report on the
elapsed times, which is a pleasant discovery as I was afraid the new driver
would turn out to be slower.
The relevant configuration of my test machine is: AMD 486DX4-100, Genoa
TurboExpress motherboard with SiS471 VLB motherboard tweeked for maximum
performance, 512KB 15ns cache, 32mb 60ns memory, BusLogic 445C VL SCSI, and
Seagate ST12550N 2.1GB hard disk. I have not done any performance testing on
my main Pentium system.
The Linux kernels I tested are all identical and pretty much vanilla 1.2.8
except for the BusLogic driver.
Description XFree86.create rm -rf xc contrib
====================================== ============== =================
Present BusLogic Driver 9:19.70 3:44.78
LNZ Driver, Tag Queuing Disabled 9:05.10 3:31.90
LNZ Driver, Tag Queuing Enabled 8:34.58 3:09.75
So, a rough calculation shows that in this one test the new driver is about
2.6% faster with tag queuing disabled, and 8.1% faster with tag queuing
enabled. Comparing the same driver without and with tag queuing, tag queuing
itself is worth about 5.6% in this example.
Who knows whether the higher level parts of the SCSI Subsystem are able to take
full advantage of tag queuing, though, as they've never been tuned for it.
Once I'm happy with the robustness of the driver, I plan to investigate further
the pattern of commands the driver is being handed, issues of scatter/gather
limits, etc. For example, the current strategy of a single cmd_per_lun value
seems to be a limitation, I would think. A fast disk could reasonably have
many commands queued whereas ther's not much point in that for a slow CD-ROM.
Have you tried getting a command abort to work? On the 1542 this just
made things worse, so it is a no-op in the 1542 driver.
Well, I've implemented the code for this, but I've never actually seen it get
used. The BusLogic cards are supposed to be able to abort commands, though it
is not recommended to do so when using Tag Queuing, so I decline in that case.
Before we start coding, we should probably work out the exact
details to how we handle a few things:
Indeed. It would also be good if the precise options available to the author
of a driver were explained somewhere. I've done my best based on reading the
comments and looking at some of the other drivers, but I still wouldn't say I
really understand this area completely.
* Who decides whether we should perform a bus reset or a bus device
reset, and under what conditions should we perform the bus reset.
I think the high level code should be making the decisions about how bad a
state the SCSI subsystem is in. Should we have a number of severity levels and
leave it up to the driver's author how to map each one onto what the board is
capable of, or do we want to specify more precisely what should be done?
For this, I was thinking that we could check to see whether all of
the other active commands on the bus have timed out - if so then
we could consider it safe to perform the bus reset.
That would certainly make it safe. In any event, when all else fails we should
certainly execute the most thorough reset we can.
* If we send a bus reset who sends the command completion notification
to the other outstanding commands on the bus? scsi.c, or the
low level driver?
How do we handle boards that keep track of the outstanding commands
and automatically restart them. How about boards that automatically
report the commands failed with a DID_RESET messgae. How about
boards that completely flush all memory of the outstanding commands.
The driver itself must be able to account for these differences,
so the author of the driver must know what the board is going to
do.
Exactly. If I'm going so far as to reset the bus, I figure I might as well
give the card a hard reset as well just in case there's anything wedged. So
driver authors will need to have all possibilities available, and a way to
indicate what they've chosen to do.
* For removeable media, should we attempt to relock the door?
Should we pass a message up indicating doorlock required?
This requires that another scsi command be sent, so we need
to be careful here. Currently if we get a message from the
device indicating a reset has taken place, we retry the command.
Perhaps we should save whatever state information we can at the high level, and
then restore this state if possible when the next command for that device comes
through? I envision a state vector something like that maintained for TTYs, so
that state-setting ioctl's are not completely ephemeral.
* Tape drives. I suggest that all access should fail after a reset
until the user rewinds the tape, or we get a media change. This
could be tricky, because a UNIT ATTENTION is also used to indicate
a media change, so it might be difficult to tell whether the
tape position is no longer lost. Do we want to force the
user to rewind in all cases? Can we query the device to find
out if it is at BOT, and if so clear the flag, else command
fail?
It appears there is a READ POSITION SCSI command which, if implemented, can
tell us if we are at the beginning of a partition or not.
Leonard