[3023] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

problems with scsi disk on AHA 2940

daemon@ATHENA.MIT.EDU (Michael Granzow)
Sun Jan 4 07:00:09 1998

Date: 	Sun, 4 Jan 98 12:18:01 +0100
From: Michael Granzow <mg@medi.physik.uni-oldenburg.de>
To: linux-scsi@vger.rutgers.edu

here's my system:

 linux 2.0.32 (SMP) on a
 tyan S1668 Dual PentiumPro ATX 440Fx
 with 2 PPro 200MHz and 64 MB RAM
 adaptec AHA 2940 UW scsi controller
 2 Quantum XP34300W 4.3 GB scsi disks.

here's my problem:

 part of a cron job that runs every day in the early morning is to
 update the database for the `locate' command.  usually, this doesn't
 cause trouble.  beginning on the 26th of dec, however, it caused the
 following kernel messages:

Dec 26 06:43:56 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00 
Dec 26 06:43:56 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:43:56 deneb kernel: Additional sense indicates Unrecovered read error
Dec 26 06:43:56 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:43:56 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125009, block=499743
Dec 26 06:44:00 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00 
Dec 26 06:44:00 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:44:00 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 26 06:44:00 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:44:00 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125010, block=499743
Dec 26 06:44:03 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 7d 02 00 
Dec 26 06:44:03 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 26 06:44:03 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 26 06:44:03 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549
Dec 26 06:44:03 deneb kernel: EXT2-fs error (device 08:11): ext2_read_inode: unable to read inode block - inode=125016, block=499743

 the same thing happened the following days.  i have strong reason to
 assume that there were no other jobs running on this system over
 xmas.  when i logged in on dec. 30, i found these messages and
 thought it might be a good idea to umount /dev/sdb1 (sdb is id 1) and
 run e2fsck (1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09) on it.  when
 i did that, i could watch the system load go up and initially was
 able to do other things on the system, but responsiveness decreased
 until i couldn't do anything (not even switch between virtual consoles).
 telnet could connect, but waited forever, ping worked.  after about
 40 minutes i decided to hit the reset key (had almost forgotten where
 it is since i kicked nt off my hard disks :).

 then i found the following messages

Dec 30 17:33:19 deneb kernel: SCSI bus is being reset for host 0 channel 0.
Dec 30 17:33:19 deneb kernel: (scsi0:-1:0) Reset device, active_scb 3
Dec 30 17:33:19 deneb kernel: scsi0: (targ -1/chan A) matching scb to (targ 1/chan A)
Dec 30 17:33:19 deneb last message repeated 2 times
Dec 30 17:33:19 deneb kernel: scsi0: Resetting current channel A
Dec 30 17:33:19 deneb kernel: scsi0: Channel reset, sequencer restarted
Dec 30 17:33:20 deneb kernel: (scsi0:1:0) Aborting scb 3
Dec 30 17:33:24 deneb kernel: (scsi0:1:0) Underflow - Wanted at least 27648, got 1024, residual SG count 25.
Dec 30 17:33:26 deneb kernel: (scsi0:1:0) Underflow - Wanted at least 27648, got 1024, residual SG count 25.
Dec 30 17:33:26 deneb kernel: scsi0: MEDIUM ERROR on channel 0, id 1, lun 0, CDB: Read (6) 0f 40 79 36 00 
Dec 30 17:33:26 deneb kernel: Current error sd08:11: sense key Medium Error
Dec 30 17:33:26 deneb kernel: Additional sense indicates Data synchronization mark error
Dec 30 17:33:26 deneb kernel: scsidisk I/O error: dev 08:11, sector 999486, absolute sector 999549


 after reboot i was able to rerun e2fsck without
 the above problems, and i don't seem to have lost any data (keeping
 my fingers crossed).

 however, the following obscure lines landed in /var/log/messages (i
 have no idea what they mean, maybe they are not related to the above
 problem at all; there is some binary data interspersed which i have
 omitted): 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
         **********BEGIN***********
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

SE THE SOFTWARE.
Motorola assumes no responsibility for the maintenance and support of the SOFTWARE.

You are hereby granted a copyright license to use, modify, and distribute the SOFTWARE
so long as this entire notice is retained without alteration in any modified and/or
redistributed versions, and that such modified versions are clearly identified as such.
No licenses are granted by implication, estoppel or otherwise under any patents
or trademarks of Motorola, Inc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68060 SOFTWARE PACKAGE (Kernel version) SIMPLE TESTS
-----------------------------------------------------

The files itest.sa and ftest.sa contain simple tests to check
the state of the 68060ISP and 68060FPSP once they have been installed.

Release file format:
--------------------
The release files itest.sa and ftest.sa are essentially
hexadecimal images of the actual tests. This format is the
ONLY format that will be supported. The hex images were created
by assembling the source co29112he {i,f}test.sa hex files contain
no symbol names, this section contains function entry points that are fixed
with respect to the top of the package. The currently defined entry-points
are listed in section "68060{ISP,FPSP}-TEST entry points" below. A calling
routine would simply execute a "bsr" or "jsr" that jumped to the selected
function entry-point.

For example, to run the 060ISP test, write a program that includes the
itest.sa data and execute something similar to:

	bsr	_060ISP_TEST+128+0

(_060ISP_TEST is the starting address of the "Call-out" section; the "Call-out"
section is 128 bytes long; and the 68060ISP test entry point is located
0 bytes from the top of the "Entry-point" section.)

The third section is the code section. After entering through an "Entry-point",
the entry code jumps to the appropriate test code within the code section.

68060ISP-TEST Call-outs:
------------------------
0x0: _print_string()
0x4: _print_number()

68060FPSP-TEST Call-outs:
-------------------------
0x0: _print_string()
0x4: _print_number()

The test packages call _print_string() and _print_number()
as subroutines and expect the main program to print a string
or a number to a file or to the screen.
In "C"-like fashion, the test program calls:

	print_string("Test passed");

		or

	print_number(20);

For _print_string(), the test programs pass a longword address
of the string on the stack. For _print_number(), the test programs pass
a longword number to be printed.

For debugging purposes, after the main program performs a "print"
for a test package, it should flush the output so that it's not
buffered. In this way, if the test program crashes, at least the previous
statements printed will be seen.

68060ISP-TEST Entry-points:
---------------------------
0x0: integer test

68060FPSP-TEST Entry-points:
----------------------------
0x00: main fp test
0x08: FP unimplemented test
0x10: FP enabled snan/operr/ovfl/unfl/dz/inex

The floating-point unit test has 3 entry points which will require
3 different calls to the package if each of the three following tests
is desired:

main fp test: tests (1) unimp effective address exception
		    (2) unsupported data type exceptions
		    (3) non-maskable overflow/underflow exceptions

FP unimplemented: tests FP unimplemented exception. this one is 
		  separate from the previous tests for systems that don't
		  want FP unimplemented instructions.

FP enabled: tests enabled snan/operr/ovfl/unfl/dz/inex.
	    basically, it enables each of these exceptions and forces
	    each using an implemented FP instruction. this process
	    exercises _fpsp_{snan,operr,ovfl,unfl,dz,inex}() and
	    _real_{snan,operr,ovfl,unfl,dz,inex}(). the test expects
	    _real_XXXX() to do nothing except clear the exception
	    and "rte". if a system's _real_XXXX() handler creates an
	    alternate result, the test will print "failed" but this
	    is acceptable.

Miscellaneous:
--------------
Again, itest.sa and ftest.sa are simple tests and do not thoroughly
test all 68060SP connections. For example, they do not test connections
to _real_access(), _real_trace(), _real_trap(), etc. because these
will be system-implemented several different ways and the test packages
must remain system independent.

Example test package set-up:
----------------------------
_print_str:
	.			# provided by system
	rts

_print_num:
	.			# provided by system
	rts

	.
	.
	bsr	_060FPSP_TEST+128+0
	.
	.
	rts

# beginning of "Call-out" section; provided by integrator.
# MUST be 128 bytes long.
_060FPSP_TEST:
	long	_print_str - _060FPSP_TEST	
	long	_print_num - _060FPSP_TEST
	space	120

# ftest.sa starts here; start of "Entry-point" section.
	long	0x60ff0000, 0x00002346
	long	0x60ff0000, 0x00018766
	long	0x60ff0000, 0x00023338
	long	0x24377299, 0xab2643ea
		.
		.
		.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
            ************END************
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 can anybody explain to me what this means and tell me what is wrong
 with my disk?

 when i run `exfs2k -f /dev/sdb1' now, i get the following output:

Pass 1: Checking inodes, blocks, and sizes
Error reading block 499743 (Attempt to read block from filesystem resulted in short read) while doing inode scan.  Ignore error<y>? yes

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sdb1: 99325/1101824 files (10.9% non-contiguous), 4170147/4401778 blocks


 any help is highly appreciated.  i'm a little concerned...

yours, michael.

ps:
$ cat /proc/scsi/aic7xxx/0 
Adaptec AIC7xxx driver version: 4.1/3.2
Compile Options:
  AIC7XXX_RESET_DELAY    : 15
  AIC7XXX_TAGGED_QUEUEING: Disabled
  AIC7XXX_PAGE_ENABLE    : Disabled
  AIC7XXX_PROC_STATS     : Disabled

Adapter Configuration:
           SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter
                         (AIC-788x chipset)
               Host Bus: Wide
                Base IO: 0x6000
         Base IO Memory: 0xe2000000
                    IRQ: 11
                   SCBs: Used 4, HW 16, Page 16
             Interrupts: 52215
          Serial EEPROM: True
   Extended Translation: Enabled
         SCSI Bus Reset: Enabled
             Ultra SCSI: Disabled
Disconnect Enable Flags: 0xffcf

$ cat /proc/scsi/scsi
Attached devices: 
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: Quantum  Model: XP34300W         Rev: L912
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: Quantum  Model: XP34300W         Rev: L912
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: PIONEER  Model: CD-ROM DR-U10X   Rev: 1.07
  Type:   CD-ROM                           ANSI SCSI revision: 02

-- 
Michael Granzow

Dep. of Physics, University of Oldenburg   phone:        .49.441.7982905
D-26111 Oldenburg                          fax:          .49.441.7983698
Federal Republic of Germany       e-mail(mg@medi.physik.uni-oldenburg.de)

home help back first fref pref prev next nref lref last post