[190813] in North American Network Operators' Group
Re: Operations task management software?
daemon@ATHENA.MIT.EDU (Lee)
Wed Jul 27 20:20:33 2016
X-Original-To: nanog@nanog.org
In-Reply-To: <712E5359-5217-41AA-A779-F13DCE597537@dino.hostasaurus.com>
From: Lee <ler762@gmail.com>
Date: Wed, 27 Jul 2016 20:20:29 -0400
To: David Hubbard <dhubbard@dino.hostasaurus.com>
Cc: "nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org
On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
> Full automation is planned but does not eliminate the need for the softwa=
re.
> Zero human auditing of fully automated processes and data collection are
> not acceptable to various certifying entities, the relevant auditors, the
> inevitably involved lawyers, and won=E2=80=99t pick up on bad data, like =
a bad
> thermometer or snmp counter that says a CRAC is 65 degrees when it=E2=80=
=99s really
> 90. So I=E2=80=99m still going to need a management solution to the issu=
e whether
> it=E2=80=99s to tell someone to do the work or to tell someone to check t=
he
> automated work.
You have a ticketing system - right? Create a cron job that creates a
ticket to check whatever.
Regards,
Lee
>
> David
>
> On 7/27/16, 7:19 PM, "Lee" <ler762@gmail.com> wrote:
>
> On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
> > Hi all, curious if anyone has recommendations on software that help=
s
> manage
> > routine duties assigned to operations staff?
>
> Have computers do the routine scut work - not people.
>
> > For example, let=E2=80=99s say we have a P&P that says someone from=
the netops
> group
> > must check that Rancid is successfully backing up all router config=
s
> > bi-weekly.
>
> You've got the source code for rancid, so change rancid-run to do
> something like
> LOGFILE=3D$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE
> change the
> ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1
> to
> ) >$LOGFILE 2>&1
>
> and then in control_rancid do something like
> grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail
> if [ -s $TMP.fail ]; then
> # got some output, mail the report
> ...
>
> Do the same type thing for checking on
> > backup failures, backup internet circuit status, out of band
> interfaces, etc.
>
> Automate the checks, put the scripts in crontab & mail out an
> "OhNoes!" or "all clear" msg at the end. At which point you're left
> with the problem of making sure the managers are looking at the email=
s
> & making sure whatever problems are found actually get fixed :)
>
> Regards,
> Lee
>
>
>