[189796] in North American Network Operators' Group


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

Re: Monitoring system recommendation

daemon@ATHENA.MIT.EDU (Matthew Pounsett)
Tue Jun 7 01:07:48 2016

X-Original-To: nanog@nanog.org
In-Reply-To: <CAD0TWZ8i-Y9cqWZ9irM15BH2QrMRpBAhFOe39D5eFPhhpy3NSw@mail.gmail.com>
Date: Mon, 6 Jun 2016 10:17:39 -0700
From: Matthew Pounsett <matt@conundrum.com>
To: =?UTF-8?Q?Manuel_Mar=C3=ADn?= <mmg@transtelco.net>
Cc: NANOG <nanog@nanog.org>
Errors-To: nanog-bounces@nanog.org

On 6 June 2016 at 07:18, Manuel Mar=C3=ADn <mmg@transtelco.net> wrote:

> Dear Nanog community
>
> We are currently planning to upgrade our monitoring system (Opsview) due =
to
> scalability issues and I was wondering what do you recommend for monitori=
ng
> 5000 hosts and 35000 services. We would like to use a monitoring system
> that is compatible with the nagios plugin format, however we are not sure
> if systems like Icinga/Shinken/Op5 are the way to go.
>
> Is someone using systems like Op5 or Icinga2 for monitoring > 5000 hosts?
> Would you recommend commercial systems like Sevone, Zabbix, etc instead o=
f
> open source ones?
>

Although I haven't ever scaled it that high, I've had a lot of luck using
Gearman (mod_gearman) to make Nagios horizontally scalable.

It allows you to use Nagios itself only as a scheduler and reporting UI,
and offload all of the actual probing to other servers.  There'll be a
theoretical limit to the amount of scale you get get out of that due to
relying on a single Nagios instance to schedule checks and receive reports
of success, but I imagine it's much higher than your current requirements.


home	help	back	first	fref	pref	prev	next	nref	lref	last	post

[189796] in North American Network Operators' Group

Re: Monitoring system recommendation

daemon@ATHENA.MIT.EDU (Matthew Pounsett)Tue Jun 7 01:07:48 2016

daemon@ATHENA.MIT.EDU (Matthew Pounsett)
Tue Jun 7 01:07:48 2016