[2760] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Problems making big fs on DPT RAID

daemon@ATHENA.MIT.EDU (Chris Adams)
Wed Nov 5 17:10:14 1997

From: Chris Adams <cadams@ro.com>
To: Remy.Card@linux.org, mike@i-Connect.Net, tstso@MIT.EDU
Date: 	Wed, 5 Nov 1997 16:04:23 -0600 (CST)
Cc: linux-kernel@vger.rutgers.edu, linux-eata@i-Connect.Net,
        linux-scsi@vger.rutgers.edu

I am having trouble setting up filesystems on a big array with a DPT
PM3334UW, e2fstools 1.10 (from RedHat 4.2), and Linux 2.0.31.  Whenever
something accesses the drives attached to the DPT (even the drive not in
the array), the kernel gobbles memory - and I don't mean buffers/cache.
This is the first command after a bootup where the partitions had to be
fscked:

1:newnews:~$ w
  3:36pm  up 1 min,  1 user,  load average: 0.49, 0.19, 0.07
USER     TTY      FROM              LOGIN@  IDLE   JCPU   PCPU  WHAT
cadams   ttyp0    sprocket          3:36pm  0.00s  0.13s  0.02s  w 
2:newnews:~$ free
             total       used       free     shared    buffers     cached
Mem:        322024     182160     139864       6384     155076       3960
-/+ buffers:            23124     298900
Swap:       130748          0     130748
3:newnews:~$

Here is after a clean reboot (no fsck):

1:newnews:~$ w
  3:42pm  up 0 min,  1 user,  load average: 0.33, 0.09, 0.03
USER     TTY      FROM              LOGIN@  IDLE   JCPU   PCPU  WHAT
cadams   ttyp0    sprocket          3:42pm  0.00s  0.15s  0.01s  w 
2:newnews:~$ free
             total       used       free     shared    buffers     cached
Mem:        322024       8908     313116       6380       1668       3920
-/+ buffers:             3320     318704
Swap:       130748          0     130748
3:newnews:~$

When I try to make a big filesystem on the array (15G or more), the
system crashes.  Sometimes I get (as fast as the screen can scroll),
repeated messages like:

Aiee: scheduling in interrupt 0015b840

According to /System.map, 0015b840 is in the middle of
__get_request_wait.

The other times it crashes, I get something like (this is hand typed, as
it didn't make it to the logs):

scsi0 channel 0 : resetting for second half of retries.
SCSI bus is being reset for host 0 channel 0.
eata_reset called pid:9053 target: 9 lun: 0 reason 0
eata_reset: slot 2 in reset, pid 9060.
eata_reset: slot 4 in reset, pid 9207.
eata_reset: slot 7 in reset, pid 9116.
eata_reset: slot 8 in reset, pid 9165.
eata_reset: slot 10 in reset, pid 9067.
eata_reset: slot 14 in reset, pid 9213.
eata_reset: slot 15 in reset, pid 9214.
eata_reset: slot 16 in reset, pid 9123.
eata_reset: slot 17 in reset, pid 9172.
eata_reset: slot 18 in reset, pid 9074.
eata_reset: slot 19 in reset, pid 9215.
eata_reset: slot 20 in reset, pid 9216.
eata_reset: slot 25 in reset, pid 9179.
eata_reset: slot 26 in reset, pid 9081.
eata_reset: slot 27 in reset, pid 9130.
eata_reset: slot 35 in reset, pid 9088.
eata_reset: slot 36 in reset, pid 9137.
eata_reset: slot 37 in reset, pid 9186.
eata_reset: slot 44 in reset, pid 9095.
eata_reset: slot 45 in reset, pid 9144.
eata_reset: slot 47 in reset, pid 9193.
eata_reset: slot 53 in reset, pid 9102.
eata_reset: slot 54 in reset, pid 9151.
eata_reset: slot 57 in reset, pid 9200.
eata_reset: slot 62 in reset, pid 9109.
eata_reset: slot 63 in reset, pid 9158.
eata_reset: board reset done, enabling interrupts.
eata_reset: interrups disabled again.
eata_reset: slot 2, DID_RESET, pid 9060 done.
eata_reset: slot 4, DID_RESET, pid 9207 done.
eata_reset: slot 7, DID_RESET, pid 9116 done.
eata_reset: slot 8, DID_RESET, pid 9165 done.
eata_reset: slot 10, DID_RESET, pid 9067 done.
eata_reset: slot 14, DID_RESET, pid 9213 done.
eata_reset: slot 15, DID_RESET, pid 9214 done.
eata_reset: slot 16, DID_RESET, pid 9123 done.
eata_reset: slot 17, DID_RESET, pid 9172 done.
eata_reset: slot 18, DID_RESET, pid 9074 done.
eata_reset: slot 19, DID_RESET, pid 9215 done.
eata_reset: slot 20, DID_RESET, pid 9216 done.
eata_reset: slot 25, DID_RESET, pid 9179 done.
eata_reset: slot 26, DID_RESET, pid 9081 done.
eata_reset: slot 27, DID_RESET, pid 9130 done.
eata_reset: slot 35, DID_RESET, pid 9088 done.
eata_reset: slot 36, DID_RESET, pid 9137 done.
eata_reset: slot 37, DID_RESET, pid 9186 done.
eata_reset: slot 44, DID_RESET, pid 9095 done.
eata_reset: slot 45, DID_RESET, pid 9144 done.
eata_reset: slot 47, DID_RESET, pid 9193 done.
eata_reset: slot 53, DID_RESET, pid 9102 done.
eata_reset: slot 54, DID_RESET, pid 9151 done.
eata_reset: slot 57, DID_RESET, pid 9200 done.
eata_reset: slot 62, DID_RESET, pid 9109 done.
eata_reset: slot 63, DID_RESET, pid 9158 done.
eata_reset: exit, wakeup.
eata_dma: in_handler, reseted command pid 9213 returned
eata_dma: in_handler, reseted command pid 9216 returned
eata_dma: in_handler, reseted command pid 9215 returned
eata_dma: in_handler, reseted command pid 9060 returned
eata_dma: in_handler, reseted command pid 9067 returned
eata_dma: in_handler, reseted command pid 9074 returned
eata_dma: in_handler, reseted command pid 9081 returned
eata_dma: in_handler, reseted command pid 9088 returned
eata_dma: in_handler, reseted command pid 9095 returned
eata_dma: in_handler, reseted command pid 9102 returned
eata_dma: in_handler, reseted command pid 9109 returned
eata_dma: in_handler, reseted command pid 9116 returned
eata_dma: in_handler, reseted command pid 9123 returned
eata_dma: in_handler, reseted command pid 9130 returned
eata_dma: in_handler, reseted command pid 9137 returned
eata_dma: in_handler, reseted command pid 9144 returned
eata_dma: in_handler, reseted command pid 9151 returned
eata_dma: in_handler, reseted command pid 9158 returned
eata_dma: in_handler, reseted command pid 9165 returned
eata_dma: in_handler, reseted command pid 9172 returned
eata_dma: in_handler, reseted command pid 9179 returned
eata_dma: in_handler, reseted command pid 9186 returned
eata_dma: in_handler, reseted command pid 9193 returned
eata_dma: in_handler, reseted command pid 9200 returned
eata_dma: in_handler, reseted command pid 9207 returned
eata_dma: in_handler, reseted command pid 9214 returned

What is really weird is that sometimes, the system doesn't crash until
the filesystem is apparently done being made (mke2fs has said "done").  I
say apparently, since I can't tell for sure, because if the system
crashes when trying to make the filesystem, it also crashes trying to
check it.

Also, can somebody tell me what size blocks the '-R stride=xxx' option
to mke2fs (version 1.10) is in?  Is it 512 bytes, 1k, 2k, or what?  I
set the stripe size for the RAID 0 to 256k.  What should I set this
option to?
-- 
Chris Adams - cadams@ro.com
System Administrator - Renaissance Internet Services
I don't speak for anybody but myself - that's enough trouble.

home help back first fref pref prev next nref lref last post