[2188] in linux-scsi channel archive
Re: Problems switching between DDS tapes of different length
daemon@ATHENA.MIT.EDU (Pete Popov)
Mon Jul 21 17:04:25 1997
To: Eckard Koch <czkec@ocag.ch>
cc: linux-scsi@vger.rutgers.edu, linux-tape@vger.rutgers.edu
In-reply-to: Your message of "21 Jul 1997 11:41:47 +0200."
<79pvscg2sk.fsf@lx_kec.ocag.ch>
Date: Mon, 21 Jul 1997 13:58:24 -0700
From: Pete Popov <pete@jones.asd.sel.sony.com>
> Due to your questions I was going again trough the whole
> procedures. In the end I found out that my problem is not caused by
> different tape length but by the backup software changing the
> block size for its own tapes. Sending "mt -f /dev/nst0 setblk 0" to
> the tape cured the problem.
> Although the problem is solved I still don't really understand
> what is happening here. Especially the output of the different tape
> tools is a bit confusing (block size, blocking-factor), may be it's
> just my misinterpretation. Would you be so kind to comment on what is
> happening below?
Certainly, I'll do my best. Let me start by a quick description of how
sequential access devices write the data (specifically DDS, AIT and
Exabyte's 8mm drives, since I have no experience with QIC, TRAVAN, etc).
If you take a look at the SCSI Write command (0x0A), you will see a
"Fixed" bit in byte 1 of the CDB. This indicates whether the drive
should write data in "fixed" mode or in "variable" mode. The "transfer
length" can mean two things:
1. if fixed mode, the transfer length indicates the number of
record to be written to tape. Thus, the total transfer would be
transfer_length * current_fixed_record_size; in that case, the record
length that will be written to tape will be the length of the current
fixed record length (or default record length if the host has not
modified it since power up).
2. If variable mode, the transfer length indicates the actual
record length that will be written on tape. Thus, the total transfer
would be transfer_length * 1 (only one record will be written).
Some drives power up in fixed mode, others in variable. If the drive
is in fixed mode, you can still issue variable write commands by
turning off the fixed bit. However, if the drive is in variable mode,
the fixed record length is zero. Therefore, you first have to
define a fixed record length before you can start sending write
commands with the fixed bit set to 1. Otherwise, the drive will
return Check Condition (CC).
How does this matter when reading?
Suppose the tape is written with record length 32k (regardless of whether
"fixed" or "variable" writes were used). This means that each record
on tape is of length 32k. If you try to read the tape using any other
length (...well, there's exceptions, but we'll ignore them), the drive
will return CC, and the Illegal Length Indicator (ILI) will be set in
the sense data. From the "residual" information, it's possible for the
software to figure out what the actual record length on tape is (if the
software issued a "variable" read command).
Here's a common error condition:
1. You write a tape with variable transfers (perhaps by fist setting the
block size to zero with mt -f /dev/nst0 setblk 0 command). The record/block
length is ...let's say 32KB.
2. You power cycle the drive and it comes up in "fixed" mode -- it's
default mode, with 0x200 as the default fixed transfer length.
3. Now you try to read the tape, and your software sends a "fixed"
read command. Essentially, you are telling the drive to read N number
of records (whatever the transfer length in the CDB is), with each
record being 0x200 in length. The drive starts reading the tape, sees
that the record length is 32KB, and returns CC, with ILI set in the
sense data. Your software crashes.
Does all this make any sense?
I'll make a few guesses on the errors below.
> -----------------------------------------------------------------------
> Here is the test setup and the diagnostics I get when trying
> to reproduce the reported problem. I have an autoloader magazine
> loaded with one 90m (slot 1) tape and five 120m tapes (slots 2-6)..
> Hardware compression is switched off by software (mt-dds comp-off).
> The same tar archive has been written to the 90m tape and one of the
> 120m tapes. The system has been rebooted.
> $> mtx -f /dev/nst0 first
> $> mt -f /dev/nst0 status (90m tape)
> SCSI 2 tape drive:
> File number=0, block number=0, partition=0.
> Tape block size 0 bytes. Density code 0x13 (DDS (61000 bpi)).
> Soft error count since last status=0
> General status bits on (41010000):
> BOT ONLINE IM_REP_EN
> $> mt-dds -f /dev/nst0 tell (90m tape)
> first block number is 0
> block size is 20
> block length is 10240
So, it appears that your mtx script set the record length to
10K. The "block size" is most likely the "transfer length" that
it used in the CDB. Thus, the write command would be:
(0x0A, 0x01, 0, 0, 20, 0)
|
|--------> this is the lsb of the "transfer length field"
This command tells the drive to write 20 records, each one being 10K (if
10K is what the host previously specified).
> $> tar tvf /dev/nst0
> drwxr-xr-x amanda/sys 0 1997-07-21 07:54 amanda/
> drwxr-xr-x amanda/sys 0 1997-05-29 10:19 amanda/snp/
> drwxr-x--- amanda/sys 0 1997-07-16 16:39 amanda/bin/
> -rwsr-x--- root/sys 104975 1997-03-25 17:43 amanda/bin/gtar
> lrwxrwxrwx root/sys 0 1997-03-19 07:00 amanda/bin/load_tape
> ...
>
> $> mtx -f /dev/nst0 next (loading the 120m tape)
> $> mt -f /dev/nst0 status (120m tape)
> SCSI 2 tape drive:
> File number=0, block number=0, partition=0.
> Tape block size 0 bytes. Density code 0x24 (DDS-2).
> Soft error count since last status=0
> General status bits on (41010000):
> BOT ONLINE IM_REP_EN
>
> $> mt-dds -f /dev/nst0 tell (120m tape)
> first block number is 0
> block size is 20
> block length is 10240
> $> tar tvf /dev/nst0 (120m tape)
> drwxr-xr-x amanda/sys 0 1997-07-21 07:54 amanda/
> drwxr-xr-x amanda/sys 0 1997-05-29 10:19 amanda/snp/
> drwxr-x--- amanda/sys 0 1997-07-16 16:39 amanda/bin/
> -rwsr-x--- root/sys 104975 1997-03-25 17:43 amanda/bin/gtar
> lrwxrwxrwx root/sys 0 1997-03-19 07:00 amanda/bin/load_tape
> ...
> So far so good. Changing back and forth between these two tape
> does not trigger the problem. I found out today that the problem
> caused by the specific backup software (Amanda) I am using. This
> software changes the block size to 64. Loading such a tape mt-dds
> delivers the following output.
It is not the "block size" that's the problem! Notice the "block
length" below:
> $> mt-dds -f /dev/nst0 tell (120m Amanda Tape)
> first block number is 0
> block size is 64
> block length is 32768
Apparently, your Amanda software wrote the tape using records of 32K
length. This is "good" -- using larger transfers in order to keep
the drive streaming as much as possible.
> The backup usually starts with labeling the tape:
> $> amlabel test test_VOL3
>
> Having done so and changing back to one of the tapes
> with tar archives on it everything attempt to read the tape
> produces I/O errors and the syslog says:
> kernel: st0: Incorrect block size.
> The tape itself seems to detected correctly:
>
> $> mt -f /dev/nst0 status
> SCSI 2 tape drive:
> File number=0, block number=0, partition=0.
> Tape block size 32768 bytes. Density code 0x24 (DDS-2).
> Soft error count since last status=0
> General status bits on (41010000):
> BOT ONLINE IM_REP_EN
> This applies to tar and dd as well as mt and mt-dds
> commands trying to set the block size to 20.
>
> $> tar tv --blocking-factor=20 -f /dev/nst0
> tar: Read error on /dev/nst0: I/O error
> tar: At beginning of tape, quitting now
> tar: Error is not recoverable: exiting now
The "blocking-factor" above most likely sets the "Transfer Length"
field in the CDB. Thus, you are telling the drive to read 20 records
at a time, but the record length has NOT been set correctly yet.
If you had issued "mt -f /dev/nst0 setblk 32768" first, the above
command would have succeeded.
> $> dd if=/dev/nst0 bs=20
> dd: /dev/nst0: I/O error
> 0+0 records in
> 0+0 records out
The drive must have returned CC, with ILI set in the sense data.
> $> mt-dds -b 20 -f /dev/nst0 tell
> dds2tar: I/O error
> Finally, the solution to the problem is:
>
> $> mt -f /dev/nst0 setblk 0
This effectively sets the drive in "variable" mode. Apparently, at
this point tar is smart enough to figure out what the actual record
length on tape is and start sending the proper read commands. If
it's smart enough to do that, then perhaps it makes sense to modify
tar to make it more intelligent and figure out the tape length on
it's own. It will probably save many people a lot of heartache.
Although, I don't know if it's tar that needs to be modified in that
case, or the lower level driver.
Try this on your Amanda tape: mt -f /dev/nst0 setblk 32768. That will
work also.
I hope this helps. SCSI does not make things intuitively obvious.
Pete