[747] in linux-scsi channel archive

home help back first fref pref prev next nref lref last post

Segment violations during disk access

daemon@ATHENA.MIT.EDU (Louis Mandelstam)
Thu Nov 2 10:24:42 1995

Date: Thu, 2 Nov 1995 11:36:03 +0200 (GMT+0200)
From: Louis Mandelstam <louis@sacc.wn.apc.org>
To: linux-scsi@vger.rutgers.edu

Good day -

I've been experiencing a problem with the SCSI subsystem on a 1.2.13 
machine, with disk accesses sometimes causing the calling process to 
crash, occasionally taking the system with it.

One sure way the get the problem to show itself is to do a badblocks 
/dev/sda1 <blocksondisk>   - badblocks almost always dies with a 'Segment 
violation' before completion.

I've tried replacing the hard disk (a Conner 1080S) with a Seagate 
Baracudda which is known to work fine, and the problem persisted.  I've 
also taken out all other SCSI devices and the problem wasn't affected.  
I've also tried going to single user and running badblocks without any 
other processes possibly accessing the disk, without any change.

I have not been able to try a different SCSI adapter as yet.  One warning 
which I have been wondering about is the output fdisk gives me when I run it:

'The number of cylinders for this disk is set to 1030.
This is larger than 1024, and may cause problems with some software.'

The partition table -

Disk /dev/sda: 64 heads, 32 sectors, 1030 cylinders
Units = cylinders of 2048 * 512 bytes

   Device Boot  Begin   Start     End  Blocks   Id  System
/dev/sda1           1       1    1030 1054719+  83  Linux native
Partition 1 has different physical/logical endings:
     phys=(1023, 63, 32) logical=(1029, 63, 32)
Partition 1 does not start on cylinder boundary:
     phys=(0, 0, 2) should be (0, 1, 1)


I had the physical and logicals the same at one stage (both 1023 cyls) 
and the problem was as it is now.

/proc/pci -
PCI devices found :
Bus  0 Device  13 Function  0.
    SCSI bus controller : Adaptec 2940 (rev 0). 8259's interrupt 14.
Bus  0 Device  14 Function  0.
    Ethernet controller : DEC DC21040 (rev 35). 8259's interrupt 15.
Bus  0 Device  16 Function  0.
    Host bridge : UMC UM8881F (rev 1). 
Bus  0 Device  18 Function  0.
    ISA bridge : UMC UM8673F (rev 1). 


The AHA2940 has extended translation disabled.  I've tried with and 
without Disconnection enabled, and it doesn't appear to affect the problem.

I don't know much about SCSI, and I could be missing something obvious.  
I have tried known incorrect termination as well as known correct 
termination setups, I could see no change in the system's behaviour.

I've done a low-level format of the drive in the Adaptec BIOS - the 
problem wasn't affected.   Also, the Adaptec BIOS's disk verify routine 
finds no problems.

Any pointers, ideas for tests to try etc would be *severely* 
appreciated.   This computer is supposed to be our champion server and at 
the moment it's not reliable enough because of this problem.

Many thanks in advance.

Regards

----------------------------------------------------------------------
L.Mandelstam - System Administrator              louis@sacc.wn.apc.org
S A Council of Churches, PO Box 4921, Johannesburg, 2000, South Africa
          tel:+27-11-492-1380 x249      fax:+27-11-492-1448             
----------------------------------------------------------------------


home help back first fref pref prev next nref lref last post