[3023] in linux-scsi channel archive
problems with scsi disk on AHA 2940
daemon@ATHENA.MIT.EDU (Michael Granzow)
Sun Jan 4 07:00:09 1998
Date: Sun, 4 Jan 98 12:18:01 +0100
From: Michael Granzow <mg@medi.physik.uni-oldenburg.de>
To: linux-scsi@vger.rutgers.edu
here's my system:
linux 2.0.32 (SMP) on a
tyan S1668 Dual PentiumPro ATX 440Fx
with 2 PPro 200MHz and 64 MB RAM
adaptec AHA 2940 UW scsi controller
2 Quantum XP34300W 4.3 GB scsi disks.
here's my problem:
part of a cron job that runs every day in the early morning is to
update the database for the `locate' command. usually, this doesn't
cause trouble. beginning on the 26th of dec, however, it caused the
following kernel messages:
Dec 26 06:43:56 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00
Dec 26 06:43:56 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:43:56 deneb kernel: Additional sense indicates Unrecovered read error
Dec 26 06:43:56 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:43:56 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125009, block=499743
Dec 26 06:44:00 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00
Dec 26 06:44:00 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:44:00 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 26 06:44:00 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:44:00 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125010, block=499743
Dec 26 06:44:03 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00
Dec 26 06:44:03 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:44:03 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 26 06:44:03 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:44:03 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125016, block=499743
the same thing happened the following days. i have strong reason to
assume that there were no other jobs running on this system over
xmas. when i logged in on dec. 30, i found these messages and
thought it might be a good idea to umount /dev/sdb1 (sdb is id 1) and
run e2fsck (1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09) on it. when
i did that, i could watch the system load go up and initially was
able to do other things on the system, but responsiveness decreased
until i couldn't do anything (not even switch between virtual consoles).
telnet could connect, but waited forever, ping worked. after about
40 minutes i decided to hit the reset key (had almost forgotten where
it is since i kicked nt off my hard disks :).
then i found the following messages
Dec 30 17:33:19 deneb kernel: SCSI bus is being reset for host 0 channel 0.
Dec 30 17:33:19 deneb kernel: (scsi0:-1:0) Reset device, active_scb 3
Dec 30 17:33:19 deneb kernel: scsi0: (targ -1/chan A) matching scb to (targ 1/chan A)
Dec 30 17:33:19 deneb last message repeated 2 times
Dec 30 17:33:19 deneb kernel: scsi0: Resetting current channel A
Dec 30 17:33:19 deneb kernel: scsi0: Channel reset, sequencer restarted
Dec 30 17:33:20 deneb kernel: (scsi0:1:0) Aborting scb 3
Dec 30 17:33:24 deneb kernel: (scsi0:1:0) Underflow - Wanted at least 27648, got 1024, residual SG count 25.
Dec 30 17:33:26 deneb kernel: (scsi0:1:0) Underflow - Wanted at least 27648, got 1024, residual SG count 25.
Dec 30 17:33:26 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 79 36 00
Dec 30 17:33:26 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 30 17:33:26 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 30 17:33:26 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
after reboot i was able to rerun e2fsck without
the above problems, and i don't seem to have lost any data (keeping
my fingers crossed).
however, the following obscure lines landed in /var/log/messages (i
have no idea what they mean, maybe they are not related to the above
problem at all; there is some binary data interspersed which i have
omitted):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
**********BEGIN***********
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
SE THE SOFTWARE.
Motorola assumes no responsibility for the maintenance and support of the SOFTWARE.
You are hereby granted a copyright license to use, modify, and distribute the SOFTWARE
so long as this entire notice is retained without alteration in any modified and/or
redistributed versions, and that such modified versions are clearly identified as such.
No licenses are granted by implication, estoppel or otherwise under any patents
or trademarks of Motorola, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68060 SOFTWARE PACKAGE (Kernel version) SIMPLE TESTS
-----------------------------------------------------
The files itest.sa and ftest.sa contain simple tests to check
the state of the 68060ISP and 68060FPSP once they have been installed.
Release file format:
--------------------
The release files itest.sa and ftest.sa are essentially
hexadecimal images of the actual tests. This format is the
ONLY format that will be supported. The hex images were created
by assembling the source co29112he {i,f}test.sa hex files contain
no symbol names, this section contains function entry points that are fixed
with respect to the top of the package. The currently defined entry-points
are listed in section "68060{ISP,FPSP}-TEST entry points" below. A calling
routine would simply execute a "bsr" or "jsr" that jumped to the selected
function entry-point.
For example, to run the 060ISP test, write a program that includes the
itest.sa data and execute something similar to:
bsr _060ISP_TEST+128+0
(_060ISP_TEST is the starting address of the "Call-out" section; the "Call-out"
section is 128 bytes long; and the 68060ISP test entry point is located
0 bytes from the top of the "Entry-point" section.)
The third section is the code section. After entering through an "Entry-point",
the entry code jumps to the appropriate test code within the code section.
68060ISP-TEST Call-outs:
------------------------
0x0: _print_string()
0x4: _print_number()
68060FPSP-TEST Call-outs:
-------------------------
0x0: _print_string()
0x4: _print_number()
The test packages call _print_string() and _print_number()
as subroutines and expect the main program to print a string
or a number to a file or to the screen.
In "C"-like fashion, the test program calls:
print_string("Test passed");
or
print_number(20);
For _print_string(), the test programs pass a longword address
of the string on the stack. For _print_number(), the test programs pass
a longword number to be printed.
For debugging purposes, after the main program performs a "print"
for a test package, it should flush the output so that it's not
buffered. In this way, if the test program crashes, at least the previous
statements printed will be seen.
68060ISP-TEST Entry-points:
---------------------------
0x0: integer test
68060FPSP-TEST Entry-points:
----------------------------
0x00: main fp test
0x08: FP unimplemented test
0x10: FP enabled snan/operr/ovfl/unfl/dz/inex
The floating-point unit test has 3 entry points which will require
3 different calls to the package if each of the three following tests
is desired:
main fp test: tests (1) unimp effective address exception
(2) unsupported data type exceptions
(3) non-maskable overflow/underflow exceptions
FP unimplemented: tests FP unimplemented exception. this one is
separate from the previous tests for systems that don't
want FP unimplemented instructions.
FP enabled: tests enabled snan/operr/ovfl/unfl/dz/inex.
basically, it enables each of these exceptions and forces
each using an implemented FP instruction. this process
exercises _fpsp_{snan,operr,ovfl,unfl,dz,inex}() and
_real_{snan,operr,ovfl,unfl,dz,inex}(). the test expects
_real_XXXX() to do nothing except clear the exception
and "rte". if a system's _real_XXXX() handler creates an
alternate result, the test will print "failed" but this
is acceptable.
Miscellaneous:
--------------
Again, itest.sa and ftest.sa are simple tests and do not thoroughly
test all 68060SP connections. For example, they do not test connections
to _real_access(), _real_trace(), _real_trap(), etc. because these
will be system-implemented several different ways and the test packages
must remain system independent.
Example test package set-up:
----------------------------
_print_str:
. # provided by system
rts
_print_num:
. # provided by system
rts
.
.
bsr _060FPSP_TEST+128+0
.
.
rts
# beginning of "Call-out" section; provided by integrator.
# MUST be 128 bytes long.
_060FPSP_TEST:
long _print_str - _060FPSP_TEST
long _print_num - _060FPSP_TEST
space 120
# ftest.sa starts here; start of "Entry-point" section.
long 0x60ff0000, 0x00002346
long 0x60ff0000, 0x00018766
long 0x60ff0000, 0x00023338
long 0x24377299, 0xab2643ea
.
.
.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
************END************
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
can anybody explain to me what this means and tell me what is wrong
with my disk?
when i run `exfs2k -f /dev/sdb1' now, i get the following output:
Pass 1: Checking inodes, blocks, and sizes
Error reading block 499743 (Attempt to read block from filesystem resulted in short read) while doing inode scan. Ignore error<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 99325/1101824 files (10.9% non-contiguous), 4170147/4401778 blocks
any help is highly appreciated. i'm a little concerned...
yours, michael.
ps:
$ cat /proc/scsi/aic7xxx/0
Adaptec AIC7xxx driver version: 4.1/3.2
Compile Options:
AIC7XXX_RESET_DELAY : 15
AIC7XXX_TAGGED_QUEUEING: Disabled
AIC7XXX_PAGE_ENABLE : Disabled
AIC7XXX_PROC_STATS : Disabled
Adapter Configuration:
SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
(AIC-788x chipset)
Host Bus: Wide
Base IO: 0x6000
Base IO Memory: 0xe2000000
IRQ: 11
SCBs: Used 4, HW 16, Page 16
Interrupts: 52215
Serial EEPROM: True
Extended Translation: Enabled
SCSI Bus Reset: Enabled
Ultra SCSI: Disabled
Disconnect Enable Flags: 0xffcf
$ cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: Quantum Model: XP34300W Rev: L912
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
Vendor: Quantum Model: XP34300W Rev: L912
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
Vendor: PIONEER Model: CD-ROM DR-U10X Rev: 1.07
Type: CD-ROM ANSI SCSI revision: 02
--
Michael Granzow
Dep. of Physics, University of Oldenburg phone: .49.441.7982905
D-26111 Oldenburg fax: .49.441.7983698
Federal Republic of Germany e-mail(mg@medi.physik.uni-oldenburg.de)