[72913] in North American Network Operators' Group
Re: Quick question.
daemon@ATHENA.MIT.EDU (Paul Jakma)
Sun Aug 1 18:46:31 2004
Date: Sun, 1 Aug 2004 23:45:42 +0100 (IST)
From: Paul Jakma <paul@clubi.ie>
To: John Underhill <stepnwlf@magma.ca>
Cc: Michel Py <michel@arneill-py.sacramento.ca.us>,
Alexei Roudnev <alex@relcom.net>, Nanog <nanog@nanog.org>
In-Reply-To: <000501c477fa$cc83de20$fe28a8c0@proxyvstar978>
Errors-To: owner-nanog-outgoing@merit.edu
On Sun, 1 Aug 2004, John Underhill wrote:
> Not necessarily. There have been a number of innovations in recent years in
> the area of integrated fault tolerance, including bios level controls over
> component monitoring / management. Some of the more upscale Compaq G3
> servers for instance, can remove a processor from operation if it exceeds a
> threshold of critical errors, (this is also true for memory).
Interesting to know. Those usually are due to ECC errors in CPU
caches often due to overheating. The CPU is still functional to a
degree though, marginal failure as opposed to catastrophic.
But what of electrical failures? Even P4 class machines still share a
host bus amongst CPUs no?
Anyway, CPUs (if kept sufficiently cool) tend to one of the more
reliable components in a system, if they are good to begin with.
> Alphas can boot even if the bootstrap processor fails at system
> start, and simply selects the next available processor..
Alphas are quite nice, they have support for lockstep operation too.
Tandem were supposed to have been moving to Alpha for their Himalaya
F-T servers when DEC bought them. Also the 21164 and up (not sure
about 21064) AXPs used a point-to-point bus for SMP[1], they were all
electrically isolated from each other - at least, a failure of one
CPU couldnt affect the other CPUs.
> So that if a process runs amock on a single bus architecture, the
> second processor will not have the resources it needs to run
> effectively..
Processes running amok still only have access to those resources
granted it. Processes generally do not have access to bare IO. What
the OS giveth, it can take away (or constrain).
1. Still alive and well in a sense, but now developed into a general
purpose PtP local CPU/IO interconnect: AMDs' HyperTransport as used
in K8.
regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
Don't get stuck in a closet -- wear yourself out.