[41374] in Resnet-Forum
Re: System Status Website Recommendations
daemon@ATHENA.MIT.EDU (Paul Seward)
Wed Sep 7 15:14:54 2016
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=94eb2c0b046a4dfea8053bed4e09
Message-ID: <CAKzNK-YMAZtt59OQwvPasGRD_xQVF58A6R93JyRfxuYxrzZsHw@mail.gmail.com>
Date: Wed, 7 Sep 2016 17:21:29 +0100
Reply-To: Paul.Seward@BRISTOL.AC.UK
From: Paul Seward <Paul.Seward@BRISTOL.AC.UK>
To: RESNET-L@listserv.nd.edu
In-Reply-To: <B5A88413-BA3C-4C73-909C-098BF0CA22E9@oit.rutgers.edu>
--94eb2c0b046a4dfea8053bed4e09
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
On 7 September 2016 at 16:18, Brian Luper <bluper@oit.rutgers.edu> wrote:
>
>
> We use Nagios for internal monitoring as well. We don=E2=80=99t have a us=
er-facing
> status website yet. As Adam was saying, translating all the potential
> alerts into a user-friendly dashboard is a challenge.
>
We've been running a manually updated web page for years detailing major
alerts and scheduled maintenance. We're in the process of moving that to
http://status.io because it makes it easier to delegate status updates.
That's due to go live in about 10 days time...
status.io also has a reasonably friendly API, and we've done some proof of
concept work to pull status out of our nagios/icinga instances (via
https://mathias-kettner.de/checkmk_livestatus.html for nagios, equivalent
functionality is built in to icinga) but we're launching initially in
"manual updates only" mode.
Our plan is to come up with some rough and ready business rules like "x% of
APs are down in nagios, report that as degraded via status.io" and "test
authentications via our radius servers are failing, report that as an
outage" and then push those reports to status.io via the API.
The public don't need the detail we get from Nagios, so we can be fuzzy and
handwavy when we're distilling the alerts.
Also, I don't believe the translation needs to be perfect and cover every
possible case out of the gate. Handle the obvious cases, then every time
you hit something that fails in a way you don't handle, add that to the
reporting layer so you're covered next time.
status.io also allows you to push updates out to mailing lists or twitter
etc so if people want live status they can opt in to those mechanisms too,
which we couldn't do with our web page.
-Paul
--=20
----------------------------------------------------------------------
Paul Seward, Senior Systems Administrator, University of Bristol
Paul.Seward@bristol.ac.uk +44 (0)117 39 41148 GPG Key ID: E24DA8A2
GPG Fingerprint: 7210 4E4A B5FC 7D9C 39F8 5C3C 6759 3937 E24D A8A2
___________________________________________________
You are subscribed to the ResNet-L mailing list.
To subscribe, unsubscribe or search the archives,
go to http://LISTSERV.ND.EDU/archives/resnet-l.html
___________________________________________________
--94eb2c0b046a4dfea8053bed4e09
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On 7=
September 2016 at 16:18, Brian Luper <span dir=3D"ltr"><<a href=3D"mail=
to:bluper@oit.rutgers.edu" target=3D"_blank">bluper@oit.rutgers.edu</a>>=
</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:=
rgb(204,204,204);padding-left:1ex">
<div bgcolor=3D"white" lang=3D"EN-US">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:calibri"><=
u></u>=C2=A0<u></u></span></p>
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11pt;font-family:calibri">W=
e use Nagios for internal monitoring as well. We don=E2=80=99t have a user-=
facing status website yet. As Adam was saying, translating all the potentia=
l alerts into a user-friendly dashboard is
a challenge.</span></p></div></div></div></blockquote><div><br></div><div>=
We've been running a manually updated web page for years detailing majo=
r alerts and scheduled maintenance.=C2=A0 We're in the process of movin=
g that to <a href=3D"http://status.io">http://status.io</a> because it make=
s it easier to delegate status updates.=C2=A0 That's due to go live in =
about 10 days time...</div><div><br></div><div><a href=3D"http://status.io"=
>status.io</a> also has a reasonably friendly API, and we've done some =
proof of concept work to pull status out of our nagios/icinga instances (vi=
a=C2=A0<a href=3D"https://mathias-kettner.de/checkmk_livestatus.html">https=
://mathias-kettner.de/checkmk_livestatus.html</a> for nagios, equivalent fu=
nctionality is built in to icinga) but we're launching initially in &qu=
ot;manual updates only" mode.</div><div><br></div><div>Our plan is to =
come up with some rough and ready business rules like "x% of APs are d=
own in nagios, report that as degraded via <a href=3D"http://status.io">sta=
tus.io</a>" and "test authentications via our radius servers are =
failing, report that as an outage" and then push those reports to <a h=
ref=3D"http://status.io">status.io</a> via the API.</div><div><br></div><di=
v>The public don't need the detail we get from Nagios, so we can be fuz=
zy and handwavy when we're distilling the alerts.</div><div><br></div><=
div>Also, I don't believe the translation needs to be perfect and cover=
every possible case out of the gate.=C2=A0 Handle the obvious cases, then =
every time you hit something that fails in a way you don't handle, add =
that to the reporting layer so you're covered next time.</div><div><br>=
</div><div><a href=3D"http://status.io">status.io</a> also allows you to pu=
sh updates out to mailing lists or twitter etc so if people want live statu=
s they can opt in to those mechanisms too, which we couldn't do with ou=
r web page.</div><div><br></div><div>-Paul</div></div>-- <br><div class=3D"=
gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr"><div>--------------=
--------------------------------------------------------</div><div>Paul Sew=
ard, =C2=A0 =C2=A0Senior Systems Administrator, =C2=A0 =C2=A0University of =
Bristol</div><div><a href=3D"mailto:Paul.Seward@bristol.ac.uk" target=3D"_b=
lank">Paul.Seward@bristol.ac.uk</a> =C2=A0+44 (0)117 39 41148 =C2=A0 =C2=A0=
GPG Key ID: E24DA8A2</div><div>GPG Fingerprint: =C2=A0 =C2=A07210 4E4A B5FC=
7D9C 39F8 =C2=A05C3C 6759 3937 E24D A8A2</div></div></div></div></div>
</div></div>
___________________________________________________
You are subscribed to the ResNet-L mailing list.
<p>
To subscribe, unsubscribe or search the archives,
go to <a href=3D"http://LISTSERV.ND.EDU/archives/resnet-l.html" target=3D"_blank">http://LISTSERV.ND.EDU/archives/resnet-l.html</a>
___________________________________________________
--94eb2c0b046a4dfea8053bed4e09--