[99689] in RedHat Linux List

home help back first fref pref prev next nref lref last post

Re: is my hd dying soon?

daemon@ATHENA.MIT.EDU (Ramon Gandia)
Mon Nov 16 16:21:49 1998

Date: Mon, 16 Nov 1998 11:51:03 -0900
From: Ramon Gandia <rfg@nook.net>
To: redhat-list@redhat.com
Resent-From: redhat-list@redhat.com
Reply-To: redhat-list@redhat.com



Statux wrote:
> 
> I don't know about dying.. how old is the drive?

Old doesn't matter.  A drive dies when it dies.  Its generating
interrupts.  

Normally ... when the drive is healthy - it sits there nice and
quiet.  When the O/S wants to do something, say READ a sector,
it sends the request to the drive.  The drive takes the info
and puts it on the buffer or does a DMA transfer, and then
generates an interrupt on the assigned IRQ.  The interrupt
vector points to a location that has the result codes; in this
case it would be the successful completion of the READ command.
The O/S then knows the data it requested is on the buffer or
memory location.

This is different.  The interrupt is NOT generated as the result
of anything the OS did.  It was generated by the DRIVE.  The
vector points to a message that says an internal error has
occurred.

Please note that an IDE or SCSI drive has a built-in controller
and electronics.  There is not much that can happen external to
it that would make it generate the error message.  I suppose
bum power or noise spikes might, but it would have to be really
bad, and affecting all the drives and computer.  This does not
sound like it.

This has all the earmarks of early drive failure symptoms.  It
is not a surface data error or a bad sector, it is much more
serious than that.  You only get surface data error messages when
the drive is accessed.  This problem here is much different, it
is being generated spontaneously within the drive.  Perhaps
a servo tracking problem, a bad bearing that has the platter or
head so loose it loses tracking from time to time, or failing
electronics.  Keep in mind that internally, a spinning IDE or
SCSI drive is continually reading servo data (one of the tracks),
and probably also continually reading the track and sector index
marks so it knows where the head(s) are sitting.  It could also
be a cause of too much heat, but by the time you get the message
the harm has already been done.

If the drive doesn't contain anything critical to you, just keep
using it.  If not, an immediate backup is in order or a
replacement.
Keep in mind we are talking hda here, which has LILO on it, so a
failure will need replacing the drive and booting from a rescue
floppy.

Here at Nook Net we have 13 computers, and whenever one of these
messages comes up, INVARIABLY the drive fails soon enough. 
Typically
just hours, but ocassionally we see one that limps on for a month
or two ( 24 x 7 operation ).  [PS. Just love 3 year warranties....
we haven't *bought* a drive in over a year now!]

In Windows 95 and Windows NT, these error messages are NOT passed 
to the user.  You do not know of problems until it shows up as 
surface data errors, something the drive is equipped to correct
to a great degree.  So when an error finally shows up, the
situation
is well along to TERMINAL.  

Linux, on the other hand, passes these messages right along to you
on the console or in /var/log/messages.  Why chose to ignore it?


-- 
Ramon Gandia   -----   Owner & Sysadmin, Nook Net
P.O. Box 970, Nome, AK 99762  http://www.nook.net
Nome: 907-443-7575       Unalakleet: 907-624-5080
Fax:  907-443-2487    AK. Toll Free: 888-443-7525
=================================================


-- 
  PLEASE read the Red Hat FAQ, Tips, Errata and the MAILING LIST ARCHIVES!
		http://www.redhat.com http://archive.redhat.com
         To unsubscribe: mail redhat-list-request@redhat.com with 
                       "unsubscribe" as the Subject.


home help back first fref pref prev next nref lref last post