[552] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

scsi ext2fs errors, fs crashes

daemon@ATHENA.MIT.EDU (Oliver Kaiser)
Mon Aug 28 00:50:57 1995

Date: Sun, 27 Aug 95 21:33 MET DST
To: linux-scsi@vger.rutgers.edu
From: kaiser@itm.mw.tu-muenchen.de (Oliver Kaiser)
Cc: kaiser@itm.mw.tu-muenchen.de

Hope this is the right place to post this sort of haedaches :)

I want to upgrade my well working Linuxbox from a 486DX2 50 MHz, 8MB, 240MB
IDE to a Compaq ProLiant 1000 Server: 60 MHz Pentium, 32 MB, 3com509,
Adaptec 2742 Twin, 2 HP C3323-300 (1GB), COMPAQ C2247 (1GB) (equivialent to
HP 2247), COMPAQ CD-ROM CR-503BCQ). After several successfull installation I
got a filesystemcrash which destroyed some data on my disks. :(

After some investigation I tracked the problem down to haevy scsi-i/o system
traffic. This can be produced by a few simultaneous "cp -a /usr/*
/disk2/backup/usr &" commands. In syslog I get a lot of error messages (see
at the end). If I increased the number of simultaneous "cp" commands I got
lots of processes with a "D" status, which disapeared after a while. (What
is D?)

Due to this problem of incorrect filesystems I tested my linux box with the
program e2fsck. But funny, I can repair as much I want. If I run "e2fsck -f
-n /dev/sd*" I get different error messages. This tests were made imideatly
after each other without any usertraffic and without any other commands
between. Approxemately every 2nd test I get a totaly different error
message. e2fsck complains about lost inodes, duplicate references etc. 

What I have tried to solve the problem:
o I checked the scsi-termination 
  (Only one device at each end terminates the scsi bus.)
o checked the scsi signals statical and dynamical with an osziloscope
  (signals ranges: 2.8 to 3 Volt and 0 Volt)
o checked the termination power (4.7 to 5 Volt)
o changed termination from aktiv to passiv 
o reduced the scsi-bus to a short scsi-cabel (longer than 30cm) and only with
  one disk and the 2742at
o changed the scsi cables
o cleaned the pcb eisa connector
o replaced the aha 2742at with an similar 2742 (no twin)
o replaced the HP disk with the COMPAQ C2247 disk
o lots of the above combinations
o thinking and reading - but no more ideas

Other important stuff:
o I did my tests with an unmounted partition :)
o boot scsi message:
aha274x warning: ignoring channel B of 274x-twin
aha274x: extended translation disabled
AHA274X AT EISA SLOT 5:
    irq 11
    bus release time 60 bclks
    data fifo threshold 100%
    SCSI CHANNEL A:
        scsi id 7
        scsi bus parity check enabled
        scsi selection timeout 256 ms
        scsi bus reset at power-on enabled
        scsi bus termination enabled
scsi0 : Adaptec AHA274x/284x (EISA/VL-bus -> Fast SCSI) 1.28/1.11/1.29
scsi : 1 host.
aha274x: target 0 now synchronous at 10.0Mb/s
  Vendor: HP        Model: C3323-300         Rev: 4269
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, id 0, lun 0
aha274x: target 1 now synchronous at 10.0Mb/s
  Vendor: HP        Model: C3323-300         Rev: 4269
  Type:   Direct-Access                      ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, id 1, lun 0
  Vendor: COMPAQ    Model: CD-ROM CR-503BCQ  Rev: 1.0b
  Type:   CD-ROM                             ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, id 5, lun 0
scsi : detected 1 SCSI cdrom 2 SCSI disks total.
SCSI Hardware sector size is 512 bytes on device sda
SCSI Hardware sector size is 512 bytes on device sdb

Some syslog entries:
-----------------------------------------------------------------------------
Aug 11 15:23:15 fly kernel: EXT2-fs warning (device 8/4): empty_dir: bad
directory (dir 41032)
Aug 11 15:23:15 fly kernel: EXT2-fs warning (device 8/4): ext2_rmdir: empty
directory has nlink!=2 (3237)
Aug 11 15:24:07 fly kernel: EXT2-fs warning (device 8/4): ext2_rmdir: empty
directory has nlink!=2 (1)
Aug 11 15:29:31 fly kernel: free_blocks: Freeing blocks not in datazone -
block = 3669681297, count = 1
Aug 15 21:11:49 fly kernel: offset=808, inode=1441792, rec_len=0, name_len=65529
Aug 16 15:56:14 bee kernel: scsi : aborting command due to timeout : pid
17267, scsi0, id 0, lun 0 0x08 1e 0f 86 02 00 
Aug 16 15:56:15 bee kernel: scsi : aborting command due to timeout : pid
17268, scsi0, id 1, lun 0 0x0a 14 40 3c 02 00 
Aug 16 15:56:22 bee kernel: scsi : aborting command due to timeout : pid
17267, scsi0, id 0, lun 0 0x08 1e 0f 86 02 00 
Aug 16 16:08:19 bee kernel: EXT2-fs error (device 8/17): ext2_free_blocks:
Freeing blocks not in datazone - block = 715289206, count = 1
Aug 16 16:11:14 bee kernel: EXT2-fs warning (device 8/17): ext2_rmdir: empty
directory has nlink!=2 (0)
Aug 16 16:11:14 bee kernel: EXT2-fs warning (device 8/17): ext2_free_blocks:
bit already cleared for block 665013
Aug 16 16:11:14 bee kernel: EXT2-fs warning (device 8/17): ext2_free_inode:
bit already cleared for inode 165292
Aug 16 15:56:22 bee kernel: SCSI host 0 abort() timed out - resetting
Aug 16 15:56:22 bee kernel: aha274x: attempting to reset scsi bus and card
Aug 16 15:56:22 bee kernel: aha274x: target 1 now synchronous at 10.0Mb/s
Aug 16 15:56:22 bee kernel: aha274x: target 1 underflow - wanted (at least)
1024, got 16
Aug 16 15:56:22 bee kernel: aha274x: target 0 now synchronous at 10.0Mb/s
Aug 16 15:56:22 bee kernel: aha274x: target 0 underflow - wanted (at least)
1024, got 16
Aug 16 15:56:22 bee kernel: aha274x: target 1 underflow - wanted (at least)
1024, got 16
Aug 16 15:56:22 bee kernel: SCSI disk error : host 0 id 0 lun 0 return code
= 27070000
Aug 16 15:56:22 bee kernel: scsidisk I/O error: dev 0804, sector 1689478
Aug 16 15:56:22 bee kernel: SCSI disk error : host 0 id 1 lun 0 return code
= 27070000
Aug 16 15:56:22 bee kernel: scsidisk I/O error: dev 0811, sector 1327132
cp: /usr/X11R6/bin/xstdcmap: I/O error
Aug 16 16:47:52 bee kernel: zone - block = 1701408111, count = 1
------------------------------------------------------------------------------

If you need more info - no problem, just mail. :)
What can I do to reduce my headaches (and to calm down my users :) ?
What is wrong? My fault? A bug? But where? SCSI-Bug? ext2fs bug? What can I
do to debug the problem?
Every comment is wellcomed! 
(e.g.: "buy scsi host adapter xyz" but not "switch to OtherUnix" :)

Greetings
Oliver

--
    ////     Oliver Kaiser                         Tel. +49(89)2105-5335
   (o o)     e-mail kaiser@itm.mw.tu-muenchen.de   Fax. +49(89)2105-5310
oOO-(_)-OOo  http   coming soon              Priv. Tel. +49(89)1234415


home help back first fref pref prev next nref lref last post