[573] in linux-scsi channel archive
st.c driver errors can get lost!
daemon@ATHENA.MIT.EDU (Richard Waltham)
Fri Sep 1 05:27:28 1995
From: Richard Waltham <dormouse@farsrobt.demon.co.uk>
To: linux-scsi@vger.rutgers.edu
Date: Fri, 1 Sep 1995 01:37:07 +0100 (BST)
Over the last few weeks I've been developing a scsi peripheral that looks
like a multiple lun tape device and I've been using Linux to test it.
It's brought to light a few problems with the st.c driver. At least one may
be due to the low level driver fdomain.c but not understanding how the
various bits of the scsi system interact I can't be sure. Hence the
questions.
1. At boot up the multiple tape lun's are detected and tape devices are
allocated eg. /dev/st0 /dev/st1 etc. However the standard st.c driver will
only talk to the physical device allocated to lun 0 whichever device I try
to use. My scsi adaptor, Future Domain 1680, does not use disconnect and
doesn't use Identify message so no lun data is sent to the drive. The lun is
not defined in the CDB in the st.c driver.
I have patched my st.c driver to put the lun into the first byte of the
CDB as done in the sd.c driver and I can now successfully access tape
devices on seperate lun's. Should I submit patches for inclusion in the
official linux releases? Who to? Any one else interested in seeing them?
Should I post the patches here?
2. Following on from 1. I presume it is up to the low level adaptor driver
to issue an Identify message, right? This would bring the fdomain driver
closer to SCSI II specs and disconnect need not be used if the disconnect
enable bit is not set in the identify message to avoid having to get the
fdomain driver to disconnect, although I want to have a stab at that at some
stage. As an aside any suggestions/warnings on accomplishing this would be
welcome.
3. A much more serious problem is that in certain circumstances it appears
possible to loose errors reported by a scsi tape device during writing.
It is possible for instance to do a tar to tape, for the device to get an
error, and for the errors not to be reported other than possibly a short
message in /var/adm/messages. And how often do people look there to see if
tar has finished OK?
The standard kernel distributions have the st.c driver set up to write
asynchronously. Write errors are checked for at the start of the next write
command when any errors in the last write are reported. If the tar is only
one block long, in unbuffered variable block mode, or several/many blocks
long in buffered mode, the error is not detected until the close (device
release?) and the device close routine does not return an error. Also the
closing filemark(s) is written during the close routine so if there is an
error during writing the filemark this is also not reported other than by
another short message in /var/adm/messages. Looking at another driver it
appears that release does not return any value. Is that correct? If so is
there any way round this or do we have to live with it?
With tape devices utilising large built in buffers and using the st.c driver
with async writes to a device using buffered mode it could be possible to
appear to write a load of data without error only to read back and get
garbage because there was an error that wasn't reported.
A pretty nasty scenario for those relying on reliable backups. Sure it works
well almost all the time but when it doesn't you need to know when its gone
wrong. For more reliable error reporting disable asynchronous writes and
buffered mode in the scsi st.c driver.
Finally, I've read many FAQs, the Kernel Hackers Guide, looked at driver
source code and I still haven't managed to figure the scsi driver hierarchy.
Is there a doc anywhere that details how the various drivers plug together.
eg where calls originate from and which bits of the scsi hierarchy are
involved on the way to the final adaptor driver. It would certainly make it
much easier for me to understand how the whole lot ticks.
Thanks for reading all the way to the end - you did didn't you? :-)
Richard
PS I'm still looking for data on the Future Domain TMC-1800 Host Adaptor
Chip to go with the 18c30/18c50 specs. Can anyone help out with this one?