[46479] in Hotline Meeting

home help back first fref pref prev next nref lref last post

Re: Case 132270: 2-032 cluster

daemon@ATHENA.MIT.EDU (Mike Whitson)
Mon Feb 1 15:25:30 1999

To: jjmorey@MIT.EDU
Cc: ops@MIT.EDU, hotline@MIT.EDU
From: Mike Whitson <mwhitson@MIT.EDU>
Date: 01 Feb 1999 15:25:27 -0500
In-Reply-To: hotline@MIT.EDU's message of Mon, 01 Feb 1999 08:48:33 EST

I have to apologize for not getting back to you when you paged about
this; I made a mistake when I took ASO pager duty and misconfigured
the vmail on x3-2624, so I was never paged when you left vmail.

If something like this happens again, the following information is
from the file /mit/ops/doc/Pager.Info, which describes procedures for
paging Athena Ops:

> Escalation: 

> How to escalate a problem when (for one reason or another) the person
> on call does not respond.

> After the initial page, wait at least 30 minutes and page them
> again.  If you left voicemail to page the person on call, then you
> should (if you are able ) look up who it is from
> /mit/ops/doc/Pager.Schedule and page them using 'beep' in the
> net-tools locker.  This way if there is a voicemail problem they
> will be able to get the page.

> If there is no response to the second page, you should again wait at
> least 30 minutes and then page the whole Athena Server Operations
> group.  This is accomplished using with the 'beep ops-group'
> command. You should put the word 'ESCALATE' as the first word so
> that everyone knows that there is a problem that the person on call
> has not responded to.

> For example, an escalation to the rest of the group might look like this:

> % add net-tools
> % beep ops-group
> Type your message now.  End with control-D or a dot on a line by itself.
> (limitation of 60 characters enforced by metromedia)
>                  > ESCALATE file server ixion is down pls call 3-4435
> paging...
> ops-group: paged.

> Keep in mind that 'beep ops-group' pages everyone and should not be done
> lightly.  

> In summary, in the event of an emergency while no one from Athena
> Server Operations is around:
>          - Page the person on duty (via urgent vmail to x3-2624, or 'beep')
>          - Page again, if no answer after 30 minutes (preferably using 'beep')
>          - Page ops-group, if still no answer after another 30 minutes

In particular, I would have still received a page if it went out using
'beep' rather than the voicemail system.

I'm told that it turned out to be a network problem and was fixed, so
at least all's well that ends well.  Again, I'm very sorry I didn't
get back to you in a timely fashion.

Mike Whitson
MIT/IS Athena Server Operations

home help back first fref pref prev next nref lref last post