[72967] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

Re: Quick question.

daemon@ATHENA.MIT.EDU (Paul G)
Wed Aug 4 03:18:51 2004

From: "Paul G" <paul@rusko.us>
To: <nanog@merit.edu>
Date: Wed, 4 Aug 2004 03:13:45 -0400
Errors-To: owner-nanog-outgoing@merit.edu

----- Original Message ----- 
Cc: <nanog@merit.edu>From: "Paul Jakma" <paul@clubi.ie>
To: "Paul G" <paul@rusko.us>

Sent: Wednesday, August 04, 2004 3:09 AM
Subject: Re: Quick question.

>
> On Wed, 4 Aug 2004, Paul G wrote:
>
> > the second cpu buys you time - it is unlikely you're going to be
> > able to react in time on a busy single cpu box with a runaway
> > process (it launches into a death sprial almost immediately), but
> > you would usually have 10-15 mins on a dual cpu box at a minimum or
> > maybe infinity if you enforce cpu affinity for apps that tend to
> > misbehave.
>
> Why do you have 10-15 mins? If the application is multi-threaded and
> has a reasonable workload, there are plenty of types of bugs that
> will result in one spinning thread after the other, you need far
> more than just 2 CPUs! Or maybe your application vendor has "at least
> 10minutes between hitting bugs!" on it's feature list? ;)

these are observations, pertaining to software products we use a lot -
apache, mysql, apache/suexec, various mtas etc. your point is well taken in
general, but at least When Done Here(tm), dual cpu helps significantly
empirically speaking.

> Really, what you to need do is (in the face of such buggy apps) is to
> set per-task CPU time resource limits appropriate to how much
> cpu-time a task needs and how much you can afford - be it a 1, 2 or n
> CPU system.

agreed. however, this degrades performance in certain situations, is not
practical in others and introduces additional complexity (always a bad
thing). the tradeoff is significantly in favor of reactive measures (be they
automatic or human intervantion), at least in most of our installations.

paul


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[72967] in North American Network Operators' Group

Re: Quick question.

daemon@ATHENA.MIT.EDU (Paul G)Wed Aug 4 03:18:51 2004

daemon@ATHENA.MIT.EDU (Paul G)
Wed Aug 4 03:18:51 2004