[190903] in North American Network Operators' Group
Re: Operations task management software?
daemon@ATHENA.MIT.EDU (Matt Ryanczak)
Tue Aug 2 09:59:15 2016
X-Original-To: nanog@nanog.org
In-Reply-To: <712E5359-5217-41AA-A779-F13DCE597537@dino.hostasaurus.com>
From: Matt Ryanczak <ryanczak@gmail.com>
Date: Tue, 02 Aug 2016 13:59:01 +0000
To: David Hubbard <dhubbard@dino.hostasaurus.com>,
"nanog@nanog.org" <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org
Jira works well as a task tracking system for ops. Customizable work flows,
decent integration with ldap, etc. Also good for tracking software
projects. Having both software and ops tasks in one place has many benefits=
.
On Wed, Jul 27, 2016, 16:28 David Hubbard <dhubbard@dino.hostasaurus.com>
wrote:
> Full automation is planned but does not eliminate the need for the
> software. Zero human auditing of fully automated processes and data
> collection are not acceptable to various certifying entities, the relevan=
t
> auditors, the inevitably involved lawyers, and won=E2=80=99t pick up on b=
ad data,
> like a bad thermometer or snmp counter that says a CRAC is 65 degrees whe=
n
> it=E2=80=99s really 90. So I=E2=80=99m still going to need a management =
solution to the
> issue whether it=E2=80=99s to tell someone to do the work or to tell some=
one to
> check the automated work.
>
> David
>
> On 7/27/16, 7:19 PM, "Lee" <ler762@gmail.com> wrote:
>
> On 7/27/16, David Hubbard <dhubbard@dino.hostasaurus.com> wrote:
> > Hi all, curious if anyone has recommendations on software that help=
s
> manage
> > routine duties assigned to operations staff?
>
> Have computers do the routine scut work - not people.
>
> > For example, let=E2=80=99s say we have a P&P that says someone from=
the
> netops group
> > must check that Rancid is successfully backing up all router config=
s
> > bi-weekly.
>
> You've got the source code for rancid, so change rancid-run to do
> something like
> LOGFILE=3D$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE
> change the
> ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1
> to
> ) >$LOGFILE 2>&1
>
> and then in control_rancid do something like
> grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail
> if [ -s $TMP.fail ]; then
> # got some output, mail the report
> ...
>
> Do the same type thing for checking on
> > backup failures, backup internet circuit status, out of band
> interfaces, etc.
>
> Automate the checks, put the scripts in crontab & mail out an
> "OhNoes!" or "all clear" msg at the end. At which point you're left
> with the problem of making sure the managers are looking at the email=
s
> & making sure whatever problems are found actually get fixed :)
>
> Regards,
> Lee
>
>
>